How to use ADC and set flag in Assembly - adc

I understand what ADC does but I'm not sure how to manage the carry flag. If I use a regular ADD and it overflows, will it automatically set the carry flag to be 1? And if I use ADC and the CF is 1 and it doesn't overflow, will it set the CF to be 0? Thanks.

Assuming intel x86 assembler:
Both ADD and ADC will set the Carry Flag on high-order bit carry or borrow and it will be cleared otherwise.
Using ADC when the CF is 1 and there is no overflow, will result in CF=0.

For details, see the official reference at www.intel.com, page 498.
Description
Adds the destination operand (first operand), the source operand (second operand), and the carry (CF) flag and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one instruction.) The state of the CF flag represents a carry from a previous addition. When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.
[...]

Related

Please clarify values for STM32 SYSCFG_EXTICR registers

The following is an excerpt from the STM32F0x1 reference manual:
There are a lot of x's in play here. They all seem to be referring to the external interrupt number (i.e. EXTIx), but what is the x circled in red? It's a part of the binary number, so it is either 0 or 1, not the interrupt number or pin number. The documentation mentions nothing about what it means when it's 0 and when it's 1. If I dig through the header files, it seems this x is always zero.
Is the documentation just incomplete, or did I miss something or am I misunderstanding?
The documentation states that the meaning of the x-bit (the left-most bit) is reserved - i.e. it has no documented meaning for the time being.
As you indicate that the header-files always leave it as 0, this is what you want to use yourself.
It is usually documented this way because a new variant of the MCU might be offered later with a different configuration (i.e. other peripherals or similar) - and in those configurations the bit could then be documented to have a distinct meaning.

Was there ever a first parameter for the CLEAR statement?

In both GW-BASIC and QuickBASIC, statements are passed arguments, some of which are optional and can be omitted depending on the statement:
REM Move the text cursor to the specified column and row.
LOCATE row%, column%
REM Move the text cursor to the specified column without changing the row.
LOCATE , column%
In GW-BASIC, the CLEAR statement is rather unusual in that its first "argument" is always omitted:
CLEAR , basicMem
CLEAR , basicMem, basicStack
CLEAR , , basicStack
In QuickBASIC, the basicMem parameter became optional due to the interpreter/runtime managing its own memory:
CLEAR , , basicStack
What I'm wondering is whether that first "argument" ever used for anything prior to GW-BASIC, i.e. something like this was actually useful:
CLEAR missingArg, basicMem, basicStack
REM ^^^^^^^^^^
REM here
That is, was there ever an purposeful non-empty argument before the first comma?
If anybody has any idea, I'd love to know!
What I'm wondering is whether that first "argument" ever used for
anything prior to GW-BASIC, i.e. something like this was actually
useful:
CLEAR missingArg, basicMem, basicStack
REM ^^^^^^^^^^
REM here
That is, was there ever an purposeful non-empty argument before the
first comma?
Yes, there was a first argument, but there was never a 3-argument form that actually made use of it.
Microsoft (originally Micro-Soft) created Altair BASIC. It featured a CLEAR command with no arguments that set all program variables to zero. The 4K version had no strings, so it had no need for managing string space. However, the 8K, Extended, and Disk versions had a CLEAR command that also accepted a single argument of the form CLEAR x. The value x specified the maximum amount of string space available in bytes, with the default at load time of BASIC being 50 bytes in the 8K version and 200 bytes in the Extended and Disk versions until it was changed [source]. That's where the missing first argument came from and what it was used for originally. At the time, however, only that one argument was valid.
Microsoft went on to develop a derivative called "BASIC-80" for several systems, notably the Intel ISIS-II, CP/M, and TEKDOS operating systems. A "Standalone Disk BASIC" version of BASIC-80 was also created that could run on "almost any 8080 or Z80 based disk hardware without an operating system." There was no 4K version of BASIC-80, so it's reasonable to assume all versions of BASIC-80 had strings available as the 8K version of Altair BASIC did. As a result, that string space needed managed. However, it was in BASIC-80 that a second argument was added:
CLEAR [expression![,address]]
expression! was an expression that specified the amount of string space, like in 8K (Altair) BASIC, and address was the maximum address available to BASIC, i.e. the amount of memory available to BASIC, like the argument immediately after the first comma in GW-BASIC.
Eventually, BASIC-80, Release 5.0, was shipped into the world, and it featured the odd syntax instead:
CLEAR [,[expression1][,expression2]]
expression1 was the maximum memory available to BASIC, and expression2 was the amount of stack space. Appendix A: New Features in BASIC-80, Release 5.0 explains why the first argument was dropped:
String space is allocated dynamically, and the first argument in a two-argument CLEAR statement will be ignored.
In other words, CLEAR strSpace!,maxMem would ignore the strSpace! argument in BASIC-80, Release 5.0, so the syntax became CLEAR [,[maxMem][,maxStack]].
QuickBASIC eventually changed the syntax further to just CLEAR [,,stack].
Confusingly, the on-line help system of QuickBASIC 4.5 states the following:
Note: Two commas are used before stack to keep QuickBASIC compatible
with BASICA. BASICA included an additional argument that set the
size of the data segment. Because QuickBASIC automatically manages
the data segment, the first parameter is no longer required.
"The first parameter" mentioned is maxMem as BASICA (and GW-BASIC) used the syntax available with BASIC-80, Release 5.0, rather than the equally missing strSpace! parameter used by pre-5.0 releases of BASIC-80.

Immediate Addressing mode difference?

Recently when I was studying the concept of addressing modes, the first type being immediate addressing mode, consider the example ADD #NUM1,R0 (instruction execution from left to right)
Here, is the address of NUM1 stored in Register R1?
What about when we do ADD #4,R0 to make it point to the next data, when we use #4, I understood that it adds 4 to contents of Register R0. Is there a difference when we use #NUM1 and #4. Please explain!
Is there a difference when we use #NUM1 and #4
In the final machine code in the executable that a CPU will actually run, no, there isn't.
If you have an assembler that directly creates an executable (no separate linking step), then the assembler will know at assemble time the numeric address of NUM1, and simply expand it as an immediate, producing exactly the same machine code as if you'd written add #0x700, R0. (Assuming the NUM1 label ends up at address 0x700 for this example.)
e.g. if the machine encoding for add #imm, R0 is 00 00 imm16, then you'll get 00 00 07 00 (assuming a bit-endian immediate).
Here, is the address of NUM1 stored in Register R1?
No, it's added to R0. If R0 previously contained 4, then R0 will now hold the address NUM1+4.
R1 isn't affected.
Often you have an assembler and a separate linker (e.g. as foo.s -o foo.o to assemble and then link with ld -o foo foo.o).
The numeric address isn't available at assemble time, only at link time. An object file format holds metadata for the symbol relocations, which let the linker fill in the absolute numeric addresses once it decides where the code will be loaded.
The resulting machine code will still be the same.

What's the difference between a Sentinel value and an End-of-file character?

This question stemmed from this (Software Development) textbook question:
A value used to indicate the end of a data stream is called:
a sentinel value.
an end of file (EOF) character.
a flag.
a driver
The correct answer is apparently 1, though I answered 2.
I wasn't able to find a definition in the textbook of an end of file character though I did find the definition of a sentinel value.
Sentinel value (Textbook)
A dummy value used to indicate the end of data within a file. Sentinel is from the word sentry, a sentry being a guard who prevents passage of unauthorised persons.
However, this contradicts what I found on Wikipedia (sources seem legit).
Sentinel value (Wikipedia)
... a special value in the context of an algorithm which uses its presence as a condition of termination, typically in a loop or recursive algorithm
Then the End-of-file definition.
End-of-file (Wikipedia)
... a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream.
So, from this, it seems the better (or correct?) answer is 2 since the question is asking about a "data stream". Does this mean the textbook definition is wrong or "dumbed down", or is an End-of-file character classed as a sentinel value?
Typically in these Software textbooks, sentinel values are usually associated with file streams and the like.
If I remember correctly, chapters discussing sentinel values generally focused on file handling, so in this case, the answer would be 1.

why is there severals encodings for one instruction in ARMv7

I am currently trying to implement a disassembler for the ARM cortex A9, which implement the ARMv7 instruction set.
For that I am using the manual "DDI0406C_b_arm_architecture_reference_manual.pdf" that can be download here (after having registered on arm website) :
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.set.architecture/index.html
In this manual, in the part A8.8 with the instructions details, I can't understand why there is several encoding for one instruction (like A1, A2, ...), that all seem to be implemented with ARMv7.
Also, as the ARM cortex A9 used thumb-2, does it also implement the A1/A2/... encodings, or only the T1/T2... ?
I really read all parts of this manual are related to encodings, but I still don't understand how we can know which encoding is used for a program.
Different encoding of an instruction do functionally different things.
One example for usage of different encodings is A8.9.12 ADR
This instruction adds an immediate value to the PC value to form a PC-relative address, and writes the result to the
destination register.
If instruction is encoded as A1 then offset must be interpreted as zero or positive, if it is encoded as A2 then offset is negative.
Another example is A8.8.132 POP
If the list contains more than one register, the instruction is assembled to encoding A1. If the list contains exactly one register, the instruction is assembled to encoding A2.
I can imagine different POP encodings are created probably to create different microcodes for performance reasons.
For the second part of your question, Cortex-A9 is an ARMv7-A architecture CPU and it supports all the instructions as specified in the manual you pointed. May be you should also read Cortex™-A9 Technical Reference Manual.
There is no way to really distinguish between ARM and Thumb inside the instruction-stream. You can only decide based on the way a function gets called (if the lowest bit is set to 1 then it's thumb, otherwise arm).
The ARM-Encoding are quite "stable", you'll only find a few A1 encodings, BLX is an example where a A2 encoding is given, but this is mainly because the new ARM-ARM is structured differently from the older ones. BL and BLX were two different instructions, BLX was added in additional instruction space (the upper 4 bits which are normaly used for conditions are set to 1111, which in ARM prior to v5 meant "never execute".
For the Thumb-Encodings it's different, there are a lot of them, because they had to be placed in a more compressed instruction-space, page A6-220 has information about how to decide whatever an thumb instruction consists of two or just one halfword.
The Ax encodings are arm when the processor is in arm mode it will decode bits it finds using those encodings. if there is more than one A1, A2, it should be obvious that there is a different feature or reason for that. those two instructions can be considered separate (look at the overuse of the mov instruction in x86 for example, it has many encodings). Treat each encoding as a separate "instruction".
Then there are the Tx variants, those are thumb and thumb2 extensions. The thumb are all 16 bit (the bl can be decoded as two separate 16 bit instructions) and the descriptions below them indicate "all thumb variants" or "armv4t to the present" or some such language. The thumb2 extensions are all 32 bit, the first 16 bits being an undefined instruction in the thumb world. These have more limitations on what architectures support them.
You are not going to be able to completely create a disassembler for one of these processors, for the same reason you cant do one for x86 or many other processors (all?). If you assume that all the instructions are one mode (arm or thumb or thumb+thumb2) but no mode mixing (arm+thumb) then you can because everything is fixed instruction length and you can simply disassemble everything data and code and you wont run into any problems. In order to disassemble mixed mode you have to basically emulate/execute the instructions and follow instruction flow (just like a variable word length instruction set disassembler) to try to find the transitions, problem here of course the transitions are multi instruction at a minimum load a register then bx that register, sometimes there is math involved on the instruction computation, and there is no guarantee that the address computation or load happens the instruction before the bx. So you could do some of that and get a long way through disassembling the program.
If thumb2 is supported/allowed on the processor you are using then you have the variable instruction length problem for times that you detect entry points to thumb code. And unless you already are doing this you have to follow execution of the code to determine where instructions start (elementary variable instruction length disassembly stuff).
The combination of technical reference manual and architectural reference manual will tell you if the architecture and implementation of that architecture (trm) allow arm and thumb modes. I would assume an A9 supports arm thumb and thumb2, all three.
The cortex-m family is the only one so far that is limited to not supporting arm, and their thumb2 varies widely as the cortex-m0 (and m1) are armv6m and the m3 and m4 are armv7m (A few dozen (armv6m) instructions to many dozen thumb2 extensions in the armv7m). There are separate architectural reference manuals specifically for the -m variants, armv7-m vs the armv7-ar manuals for example.