Effect of Logical Operations on Carry Flag (8085) - 8085

why carry flag gets affected by logical operations like AND, OR and XOR, with respect to Intel 8085 microprocessor?

The carry flag is set to zero for these logical operations. These operations do not carry, nor do they borrow.
Paging through the 8080/8085 Assembly Language Programming Manual from Intel (© 1978), I can not find any explanation other than that.
Would you know it. Someone actually scanned this book and put it online. See here. I actually have this book, but the the pdf will make it easier to add the image shown below. This was taken from page 1-10. (Click on image for better view.)

Related

OS - Difference between NX bit and Valid-Invalid Bit

What i understand:
Setting the NX bit high for a page will mean that whenever some program tries to execute the code in that page, it will cause a segmentation fault.
There is also a Valid-Invalid bit, and accessing a page marked Invalid will also cause a segmentation fault.
So the question is:
Why is the NX bit not redundant when you already have a valid-invalid bit ? What does it mean to mark a page both Valid and NX ?
First, I am going to change the terminology slightly. Instead of using terms like valid/invalid, I am going to replace them with present/not present.
A page that is marked as present can be accessed in some fashion (read, write or execute). A page that is marked as not present is not directly accessible (if at all). If it can be accessed, it must first be loaded from some type of backing store (disk/flash/...) into memory.
The NX bit only controls whether the page of memory has execute permissions. Why is controlling the execute permissions important? It helps in locking down the system to help prevent the execution of arbitrary code.
So, a page that is marked as both NX and not present is one that does not have execute permissions, but may need to be loaded from backing store.
If you do not have an NX bit (especially on your data pages), then if some clever hacker figures out how to jump to code in the data section while in supervisor/kernel mode, it makes their job much easier to execute arbitrary code on your system.
Hope this helps.

Implementing a priority queue in matlab in order to solve optimization problems using BRANCH AND BOUND

I'm trying to code a priority queue in MATLAB, I know there is the SIMULINK toolbox for priority queue, but I'm trying to code it in MATLAB. I have a pseudo code that uses priority queue for a method called BEST First Search with Branch and Bound. The branch and bound algorithm design strategy is a state space tree and it is used to solve optimization problems. simple explanation of what is branch and bound
I have read chapter 5: Branch and Bound from a book called 'FOUNDATIONS OF ALGORITHMS', it's the 4th edition by Richard Neapolitan and Kumarss Naimipour , and the text is about designing algorithms, complexity analysis of algorithms, and computational complexity (analysis of problems), very interesting book, and I came across this pseudocode:
Void BeFS( state_space_tree T, number& best)
{
priority _queue-of_node PQ;
node(u,v);
initialize (PQ) % initialize PQ to be empty
u=root of T;
best=value(v);
insert(PQ,v) insert(PQ,v) is a procedure that adds v to the priority queue PQ
while(!empty(PQ){ % remove node with best bound
remove(PQ,v);
remove(PQ,v) is a procedure that removes the node with the best bound and it assigns its value to v
if(bound(v) is better than best) % check if node is still promising
for (each child of u of v){
if (value (u) is better than best)
(best=value(u);
if (bound(u) is better than best)
insert(PQ,u)
}
}
}
I don't know how to code it in matlab, and branch and bound is an interesting general algorithm for finding optimal solutions of various optimization problems, especially in discrete and combinatorial optimization, instead of using heuristics to find an optimal solution, since branch and bound reduces calculation time and finds the optimal solution faster.
EDIT:
I have checked everywhere whether a solution already has been implemented , before posting a question here. And I came here to get ideas of how I can get started to implement this code
I have included this in your post so people can know better what you expect of them. However, 'ideas to get started to implement' is still not much more specific than 'how to write code in matlab'.
However, I will still try to answer:
Make the structure of the code, write the basic loops and fill them with comments of what you want to do
Pick (the easiest or first) one of those comments, and see whether you can make it happen in a few lines, you can test it by generating some dummy input for that piece of code
Keep repeating step 2 untill all comments have the required code
If you get stuck in one of the blocks, and have searched but not found the answer to a specific question. Then this is not a bad place to ask.

In Simulink, are Goto and From blocks generally considered bad style?

I was working on a Simulink model recently and was using Goto and From blocks to keep a very busy system from becoming a twisted mess of wires. I was informed that I was not to use Goto and From blocks as they are considered bad style (at least, according to my employer).
While I hold that wires should be kept connected whenever possible, I believe that Goto and From blocks can significantly improve the readability of a system/subsystem if the model would result in lots of crossed wires otherwise; especially if the blocks can be color-coded (e.g. purple Goto block goes to all the purple From blocks).
I'd supply an image of the subsystem I'm working with, but I'm not sure I can put it on here. The subsystem itself has about 12 subsystem blocks (and possibly more later) within it, each with two bus-type outputs. The first output of each subsystem goes to a Bus Creator block, and the second output of each goes to a second Bus Creator block. Since the subsystem are aligned vertically and the Bus Creators are to the right, this results in many crossed wires. I was using Goto and From blocks to clean up the system.
I can supply an image of a smaller, but similar model that I put together for this question.
For a system with on the order of 12 subsystems, this becomes very busy. I was using Goto and From blocks to connect the subsystems and the Bus Creators without a plethora of crossed wires.
I believe my employer may be carrying the stigma of using goto statements from text-based languages and applying it to Goto/From blocks in Simulink. Generally speaking, is using Goto and From blocks in this way (or any way) considered to be bad style?
The Mathworks Automotive Advisory Board has published some modeling guidelines (PDF) that include usage of Goto/From. The rules they list are:
Do not have subsystems that are floating, i.e. all inputs / output ports are connected via Gotos. One of the great things about Simulink is the ability to determine signal flow with only a cursory visual inspection, do not destroy this by linking everything with Gotos. At least have one feed-forward and one feedback loop between subsystems connected by signal lines.
My personal opinion on feedback signals is that they should all be connected with signal lines, but I'm sure you can come up with cases where drawing all of them clutters the model.
The second guideline is about the scope of the Goto tag; keep the visibility local as much as possible.
I feel setting visibility to scoped is acceptable also as long as you're not using the matching From more than a couple of levels downstream from the Goto. I've yet to come across a legitimate need for a global Goto tag.
So, all Goto usage isn't bad, and you're right that it can improve readability in some cases. That being said, I don't think Gotos are justified for the picture above. I realize it is just an example, but I should point out that if the buses being created are virtual that order of the inputs at the creator doesn't matter, and rearranging Bus Create and Mux block inputs can work wonders for readability.
The problem with the guidelines above are that there's room for bending them, and developers on your team might do just that. Even if everyone is diligent about following them at first, you may run afoul of these guidelines one day, a long time from now, when you redraw that section of the model for refining / adding functionality. Rearranging inputs and outputs can be especially irritating in middle of implementing some cool new feature. That may be the reason your employer chose to impose a blanket ban. It is inconvenient in some cases, but is easier to enforce.

Looking for the best equivalents of prefetch instructions for ia32, ia64, amd64, and powerpc

I'm looking at some slightly confused code that's attempted a platform abstraction of prefetch instructions, using various compiler builtins. It appears to be based on powerpc semantics initially, with Read and Write prefetch variations using dcbt and dcbtst respectively (both of these passing TH=0 in the new optional stream opcode).
On ia64 platforms we've got for read:
__lfetch(__lfhint_nt1, pTouch)
wherease for write:
__lfetch_excl(__lfhint_nt1, pTouch)
This (read vs. write prefetching) appears to match the powerpc semantics fairly well (with the exception that ia64 allows for a temporal hint).
Somewhat curiously the ia32/amd64 code in question is using
prefetchnta
Not
prefetchnt1
as it would if that code were to be consistent with the ia64 implementations (#ifdef variations of that in our code for our (still live) hpipf port and our now dead windows and linux ia64 ports).
Since we are building with the intel compiler I should be able to many of our ia32/amd64 platforms consistent by switching to the xmmintrin.h builtins:
_mm_prefetch( (char *)pTouch, _MM_HINT_NTA )
_mm_prefetch( (char *)pTouch, _MM_HINT_T1 )
... provided I can figure out what temporal hint should be used.
Questions:
Are there read vs. write ia32/amd64 prefetch instructions? I don't see any in the instruction set reference.
Would one of the nt1, nt2, nta temporal variations be preferred for read vs. write prefetching?
Any idea if there would have been a good reason to use the NTA temporal hint on ia32/amd64, yet T1 on ia64?
Are there read vs. write ia32/amd64 prefetch instructions? I don't see any in the instruction set reference.
Some systems support the prefetchw instructions for writes
Would one of the nt1, nt2, nta temporal variations be preferred for read vs. write prefetching?
If the line is exclusively used by the calling thread, it shouldn't matter how you bring the line, both reads and writes would be able to use it. The benefit for prefetchw mentioned above is that it will bring the line and give you ownership on it, which may take a while if the line was also used by another core. The hint level on the other hand is orthogonal with the MESI states, and only affects how long would the prefetched line survive. This matters if you prefetch long ahead of the actual access and don't want to prefetch to get lost in that duration, or alternatively - prefetch right before the access, and don't want the prefetches to thrash your cache too much.
Any idea if there would have been a good reason to use the NTA temporal hint on ia32/amd64, yet T1 on ia64?
Just speculating - perhaps the larger caches and aggressive memory BW are more vulnerable to bad prefetching and you'd want to reduce the impact through the non-temporal hint. Consider that your prefetcher is suddenly set loose to fetch anything it can, you'd end up swamped in junk prefetches that would through away lots of useful cachelines. The NTA hint makes them overrun each other, leaving the rest undamaged.
Of course this may also be just a bug, I can't tell for sure, only whoever developed the compiler, but it might make sense for the reason above.
The best resource I could find on x86 prefetching hint types was the good ol' article What Every Programmer Should Know About Memory.
For the most part on x86 there aren't different instructions for read and write prefetches. The exceptions seem to be those that are non-temporal aligned, where a write can bypass the cache but as far as I can tell, a read will always get cached.
It's going to be hard to backtrack through why the earlier code owners used one hint and not the other on a certain architecture. They could be making assumptions about how much cache is available on processors in that family, typical working set sizes for binaries there, long term control flow patterns, etc... and there's no telling how much any of those assumptions were backed up with good reasoning or data. From the limited background here I think you'd be justified in taking the approach that makes the most sense for the platform you're developing on now, regardless what was done on other platforms. This is especially true when you consider articles like this one, which is not the only context where I've heard that it's really, really hard to get any performance gain at all with software prefetches.
Are there any more details known up front, like typical cache miss ratios when using this code, or how much prefetches are expected to help?

How to determine SSE prefetch instruction size?

I am working with code which contains inline assembly for SSE prefetch instructions. A preprocessor constant determines whether the instructions for 32-, 64- or 128-bye prefetches are used. The application is used on a wide variety of platforms, and so far I have had to investigate in each case which is the best option for the given CPU. I understand that this is the cache line size. Is this information obtainable automatically? It doesn't seem to be explicitly present in /proc/cpuinfo.
I think your question is related to this question or this one. I think it is clear that - unless you can rely on a OS or library-function - you will want to use the CPUID instruction, but the question then becomes exactly what information you are looking for. - And of course, AMD's and Intel's implementations don't need to agree. This page suggests using Cpuid.1.EBX[15:8] (i.e., BH) for finding out on Intel and function 80000005h on AMD. In addition, on Intel, CPUID.2... seems to contain the relevant information, but it looks like a real pain to parse out the desired information.
I think, from what I've read, both AMD and Intel CPUID instructions will support CPUID.1.EBX[15:8], which returns the size of one cache line in QUADWORDs as used by the CLFLUSH instruction (which isn't present on all processors, so I don't know whether you'll always find something there). So, after executing CPUID.1, you'd have to multiply BH by 8 to get the cache line size in bytes. This hinges on my implicit assumption (please can anyone say whether it is really valid?) that the definition of one cache line size is always the same for CLFLUSH and PREFETCHh instructions.
Also, Intel's manuals states that PREFETCHh is only a hint, but that, if it prefetches anything, it will always be a minimum of 32 bytes.
EDIT1:
Another useful resource (even if not directly answering your question) for the optimised use of PREFETCHh is Intel's optimisation manual here.