Reorder Buffer commit - cpu-architecture

Reorder Buffer commit - cpu-architecture

When having an Out-of-Order execution processor with a reorder buffer (ROB) and branch speculation, I understand that the changes are not being made until the ROB has performed a commit.
The results of the functional units (FUs) write their result on lets say a common data bus (CDB) and on the ROB when finished executing. The ROB can then decide if the branch was successfully predicted and if so commit otherwise the ROB is flushed.
What I don't understand is what happens to the reservation station updates that came from the CDB broadcasts of the FUs, should they not be flushed/rolled back somehow?
An example I can up with (maybe not the best)
addi $s1, $zero, 8
addi $s2, $zero, 9
addi $s3, $zero, 0
bneq $s1, $s2, L1
addi $s3, $s3, 1 // first increment
L1:
addi $s3, $s3, 2 // second increment
$s3 is initialized to 0,
then a miss-prediction of the branch causes $s3 to increase by 1. The result is broadcasted to the RSs.
$s3 is now ready to start executing for the second increment of 2. At the same time the branch miss-prediction is detected and the ROB is flushed, however the RS of the adder is not changed, so $s3 now has a wrong value and was never in the ROB to be flushed.
How is this solved?
I suspect that I am missing something crucial, I am not very experienced, I just graduated and came up with this while revising.

When the branch is predicted as taken, the update to $s3 (the first increment) will not update the actual register $s3, that data will be buffered in the reorder buffer. When it is detected the misprediction, the reorder buffer will be flushed. Anyway the bottom line, to the best of my knowledge, is the original register will not be updated until the branch is resolved, hence reorder buffer is used.

As I understand it, on Intel CPUs at least, the ROB tracks all in-flight instructions inside the out-of-order part of the pipeline, including ones that are still in the RS waiting to execute. So instructions are added to both the ROB and the RS when they issue into the out-of-order part of the core.
I think this design is nearly universal. You're right that you need to be able to keep track of every instruction that's still speculative somehow.
Also, all instructions that depend on instructions discovered to have been mis-speculated need to be flushed. It's certainly easier to flush the whole ROB as you describe, and get back to the last non-speculative state.
So even though the second add still needs to be executed, the mis-speculation means its input might be wrong. So it needs to be flushed, to avoid exactly the problem you're talking about.

Related

Is there a way to add code to an infinite z80 assembly loop?

A while ago, I asked what the fastest infinite loop was on a TI-84. One of the answers I got involved using an assembly infinite loop with this code:
AsmPrgm
18FE
However, this is a bit impractical, because it can only be exited with the reset button and doesnt run anything inside it.
Is there a way to put TI-Basic code inside of this loop and/or make it exit conditionally?
Here is the link to the original question and answer:
What is the fastest infinite loop in TI-84+ Basic?

$18FE is jr -2, which loops two bytes backwards, in on itself. You'll want the additional logic to come after the start of the loop to let you escape (i.e. checking for button presses), then just have it loop back to that label. To do that, you'd need to adjust the $FE value, as that's the distance to jump. It's a signed 8-bit value, so make sure you get all your conditional code in, then branch back depending on the number of bytes you've used.

Regarding your original (linked) question, jr $ is not the fastest loop possible on Z80, as the fastest one is jp $ (actually jp (hl)), where $ denotes the address of the current instruction.
The fastest exitable loop could be done in three ways, depending on what is your definition of 'loop' is and how the loop should be exited:
Use the interrupts to quit abovementioned loop: in this case, you should unwind stack in the interrupt (remove return address) and jump elsewhere.
Use the loop like this:

IN reg,(C)
JP cc,$-2
where IN reg,(C) command also sets S (sign), Z (zero) and P/V (parity) flags depending on the value read from port and JP cc uses one of those flags to continue the loop or leave it.
Use the HALT and exit it naturally with the interrupt.
It is known that Z80 executes HALT by continuously fetching the same byte following HALT instruction from the memory, then ignoring it and doing that until the interrupt is caught. This behaviour could be described as looping until the interrupt is caught. The root cause for such behaviour is that Z80 naturally does DRAM refresh every opcode fetch and this way the refresh is kept during HALT execution.

You can definitely make assembly programs exit conditionally. The command C9 is return, so if you have a program consisting of only AsmPrgmC9, running it as an assembly program will have it instantly finish (it will look the same as running a program with nothing in it). If you want to end the loop when some condition is met, then you'll need to start learning assembly as the answer will widely vary on what that condition is and what OS version/calculator you're using.

Implementing a three-way diff/merge

I'm trying to implement a three-way diff/merge algorithm (in python) between, say, the base version X and two different derivative versions A and B, and I'm having trouble figuring out how to handle some changes.
I have a line-by-line diff from X to A, and from X to B. These diffs give, for each line, an "opcode" which is either =, if the line didn't change, + if a line was added, - if the line was removed, and c if the line was changed (which is simply a - immediately followed by a +, indicating a line was removed and then replaced, effectively modified).
Now I'm comparing corresponding opcodes from the A-diff and the B-diff, to try to decide how to merge them. Some of these opcode combos are easy: = and = means neither version changed the line, so we keep the original. + and = means that a line was added on one side and no change was made on the other, so accept the addition and advance to the next line only on the side that added the line. And - and c is a conflict that the user must resolve, because one one side a line was changed, on the other side the same line was removed.
However, I'm struggling with what to do with a + and a -, or a + and a c. In the first case for instance, I added a new line on one side, and deleted a subsequent line on the other side. Strictly, I don't think this is a conflict, but what if the addition was relying on that line being there? I guess that applies to the entire thing (something added in one place may rely on something somewhere else to make sense). The second case is similar, I added a line on one side, and on the other side I changed a subsequent line, but the addition may be relying on the original version of the line.
What is the normal approach to handling this?

The usual robust strategy (diff3, git's resolve ...) is, that changes (+ - c) in one file must be away minimum N (e.g. 3) context lines from competing changes in the other file - unless being exactly equal. Otherwise its a conflict for manual resolution. Similar to the requirement of some clean context in patch application.
Here is an example where sb tries some fancy extra strategies like "# if two Delete actions overlap, take the union of their ranges" to reduce certain conflicts. But thats risky; and rather on the other side there is no guarantee that concurrent changes do not result in problems even when very far away.

How to prevent tcpreplay from printing warning information?

I try to replay a large pcap file, while tcpreplay keep printing "Warning: Packet #50579 has gone back in time". That will affect the efficiency. It there a way to stop this?

As of tcpreplay 3.4.4, you cannot completely silence this warning, but it's trivial to change the code to do so -- look at the definitions in src/common/err.h, and change the warnx() definition to become a no-op when debugging is disabled, similarly to dbg() and dbgx().
However, you should verify whether the output indeed affects packet throughput. I doubt it, particularly if it only affects a single packet.

Is simplified semantics for the 'blame' command a good thing?

I'm working on a new weave-based data structure for storing version control history. This will undoubtedly cause some religious wars about whether it's The Right Way Of Doing Things when it comes out, but that isn't my question right now.
My question has to do with what output blame should give. When a line of code has been added, removed, and merged into itself a number of times, it isn't always clear what revision should get blame for it. Notably this means that when a section of code is deleted, all records of it having been there is gone, and there is no blame for the removal. Everyone I've gone over this issue with has said that trying to do better simply isn't worth it. Sometimes people put in the hack that the line after the section which got deleted has its blame changed from whatever it actually was to the revision when the section got deleted. Presumably if the section is at the end then the last line get its blame changed, and if the file winds up empty then the blame really does disappear into the aether, because there's literally nowhere left to put blame information. For various technical reasons I won't be using this hack, but assume that continuing but with this completely undocumented but de facto standard practice will be uncontroversial (but feel free to flame me and get it out of your system).
Moving on to my actual question. Usually in blame for each line you look at the complete history of where it was added and removed in the history and using three-way merge (or, in the case of criss-cross merges, random bullshit) and based on the relationships between those you determine whether the line should have been there based on its history, and if it shouldn't but is then you mark it as new with the current revision. In the case where a line occurs in multiple ancestors with different blames then it picks which one to inherit arbitrarily. Again, I assume that continuing with this completely undocumented but de facto standard practice will be uncontroversial.
Where my new system diverges is that rather than doing a complicated calculation of whether a given line should be in the current revision based on a complex calculation of the whole history, it simply looks at the immediate ancestors, and if the line is in any of them it picks an arbitrary one to inherit the blame from. I'm making this change for largely technical reasons (and it's entirely possible that other blame implementations do the same thing, for similar technical reasons and a lack of caring) but after thinking about it a bit part of me actually prefers the new behavior as being more intuitive and predictable than the old one. What does everybody think?

I actually wrote a one of the blame implementations out there (Subversion's current one I believe, unless someone replaced it in the past year or two). I helped with some others as well.
At least most implementations of blame don't do what you describe:
Usually in blame for each line you look at the complete history of where it was added and removed in the history and using three way merge (or, in the case of criss-cross merges, random bullshit) and based on the relationships between those you determine whether the line should have been there based on its history, and if it shouldn't but is then you mark it as new with the current revision. In the case where a line occurs in multiple ancestors with different blames then it picks which one to inherit arbitrarily. Again, I assume that continuing with this completely undocumented but de facto standard practice will be uncontroversial.
Actually, most blames are significantly less complex than this and don't bother trying to use the relationships at all, but they just walk parents in some arbitrary order, using simple delta structures (usually the same internal structure whatever diff algorithm they have uses before it turns it into textual output) to see if the chunk changed, and if so, blame it, and mark that line as done.
For example, Mercurial just does an iterative depth first search until all lines are blamed. It doesn't try to take into account whether the relationships make it unlikely it blamed the right one.
Git does do something a bit more complicated, but still, not quite like you describe.
Subversion does what Mercurial does, but the history graph is very simple, so it's even easier.
In turn, what you are suggesting is, in fact, what all of them really do:
Pick an arbitrary ancestor and follow that path down the rabbit hole until it's done, and if it doesn't cause you to have blamed all the lines, arbitrarily pick the next ancestor, continue until all blame is assigned.

On a personal level, I prefer your simplified option.
Reason: Blame isn't used very much anyway.
So I don't see a point in wasting a lot of time doing a comprehensive implementation of it.
It's true. Blame has largely turned out to be one of those "pot of gold at the end of the rainbow" features. It looked really cool from those of us standing on the ground, dreaming about a day when we could just click on a file and see who wrote which lines of code. But now that it's widely implemented, most of us have come to realize that it actually isn't very helpful. Check the activity on the blame tag here on Stack Overflow. It is underwhemingly desolate.
I have run across dozens of "blame-worthy" scenarios in recent months alone, and in most cases I have attempted to use blame first, and found it either cumbersome or utterly unhelpful. Instead, I found the information I needed by doing a simple filtered changelog on the file in question. In some cases, I could have found the information using Blame as well, had I been persistent, but it would have taken much longer.
The main problem is code formatting changes. The first-tier blame for almost everything was listed as... me! Why? Because I'm the one responsible for fixing newlines and tabs, re-sorting function order, splitting functions into separate utility modules, fixing comment typos, and improving or simplifying code flow. And if it wasn't me, someone else had done a whitespace or block-move somewhere along-the-way as well. In order to get a meaningful blame on anything dating back to a time before I can already remember without the help of blame, I had to roll back revisions and re-blame. And re-blame again. And again.
So in order for a blame to actually be a useful time saver for more than the most lucky of situations, the blame has to be able to heuristicly make its way past newline, whitespace, and ideally block copy/move changes. That sounds like a very tall order, especially when scouring the changelog for a single file, most of the time, it won't yield many diffs anyway and you can just sift through by hand fairly quickly. (The notable exception being, perhaps, badly engineered source trees where 90% of the code is stuffed in one or two ginormous files... but who these days in a collaborative coding environment does much of that anymore?).
Conclusion: Give it a bare-bones implementation of blame, because some people like to see "it can blame!" on the features list. And then move on to things that matter. Enjoy!

The line-merge algorithm is stupider than the developer. If they disagree, that just indicates that the merger is wrong rather than indicating a decision point. So, the simplified logic should actually be more correct.

Is there some commit reminder for Subversion/etc for Eclipse?

Something that will alert your/force you to commit after editing X number of files, or modifying X number of lines of code, or writing X number of lines of code.
Edit:
There's clearly no feasible way for an automated system to determine if some realizable code chunk is complete but this would be good enough for me. I don't want to use this as an "autosave" feature but more as a brain jog to remember to commit once at a suitable point.

I completely agree with the other anwsers, but I think one can use this approach if, say, you want to ensure that you make small commits rather than large ones, especially in DVCS's like Git
I think you can setup a Scheduled Task or Cron, which will hit your working directory and run something like:
svn diff | grep -E "^\+ " | wc -l
and if the count is greater than something that you deem is when you want to commit, you can make it give you a reminder. I don't think you can integrate such a thing in Eclipse.

That's not what commits are for. They are not some sort of backup mechanism. You do a commit when a piece of work has reached some state that you want to remember, normally because you are happy with it. It makes no sense at all to do them every X hours or every N lines of code.

Usually it's just a workflow thing and a habit you should get into.
Commits should be related to the work you are doing - so that reverting is meaningful. It's pretty hard for anything else to detect that except you.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse