Why does data stored in registers have memory addresses? - iphone

If I have the following code:
-(int)number {
int i = 3;
return i;
}
I can get the memory address of the integer i, by doing &i. (say while paused on a breakpoint on the return line)
However the corresponding assembly (ARM) will simple be:
MOV R0, #3
Nowhere is the memory needed (except to store the instruction), so how can i have a memory address?

That code might not need to use memory, but that does not mean it doesn't use memory. The compiler can implement it however it wants. Without optimization, this means variables will probably all be stored in memory, whether they need to be or not. For example, consider this very basic program:
int main() {
int i = 0;
return i;
}
With optimization disabled (which it is by default), Apple clang 4.0 gives me the following assembly:
_main:
sub sp, sp, #4
movw r0, #0
str r0, [sp]
add sp, sp, #4
bx lr
With optimization enabled, I get a much simpler program:
_main:
mov r0, #0
bx lr
As you can see, the unoptimized version stores the 0 in memory, but the optimized version doesn't. If you were to use the optimized version in the debugger, it would fail to give you the address of i. I actually got an error that i was undefined, since it had been optimized out completely.

Related

is data race safe in ARMV8?

As we know, access aligned fundamental data types in INTEL X86 architecture is atomic. How about ARMV8?
I have tried to get the result from Arm Architecture Reference Manual Armv8, for A-profile architecture, I did find something related to atomicity. ARMV8 is other-multi-copy atomic. It promises that multi-threads access one same LOCATION is atomic. But it says LOCATION is a byte. I am wondering that if thread 1 writes an aligned uint64_t memory without lock and thread 2 reads or writes it without lock at the same time. Is it atomic?(uint64_t is 8 bytes, but LOCATION is only one byte)
This is explained in B2.2 of the ARMv8 Architecture Reference Manual. In general, ordinary loads and stores of up to 64 bits, if naturally aligned, are single-copy atomic. In particular, if one thread stores to an address and another loads that same address, the load is guaranteed to see either the old or the new value, with no tearing or other undefined behavior. This is roughly analogous to a relaxed load or store in C or C++; indeed, you can see that compilers emit ordinary load and store instructions for such atomic accesses. https://godbolt.org/z/cWjaed9rM
Let's prove this for an example. For simplicity, let's use an aligned 2-byte halfword H, calling its bytes H0 and H1. Suppose that in the distant past, H was initialized to 0x0000 by a store instruction Wi; the respective writes to bytes H0 and H1 will be denoted Wi.0 and Wi.1. Now let a new store instruction Wn = {Wn.0,Wn.1} store the value 0xFFFF, and let it race with a load instruction R = {R.0,R.1}. Each of the accesses Wi, Wn, R is single-copy atomic by B2.2.1, first two bullets. We wish to show that either R.0,R.1 both return 0x00, or else they both return 0xFF.
By B2.3.2 there is a reads-from relation pairing each read with some write. R.0 must read-from either Wi.0 or Wn.0, as those are the only two writes to H0, and thus it must return either 0x00 or 0xFF. Likewise, R.1 must also return either 0x00 or 0xFF. If they both return 0x00 we are done, so suppose that one of them, say R.1, returns 0xFF, and let us show that R.0 also returns 0xFF.
We are supposing that R.1 reads-from Wn.1. By B2.2.2 (2), none
of the overlapping writes generated by Wn are coherence-after the corresponding overlapping reads generated by R, in the sense of B2.3.2. In particular, Wn.0 is not coherence-after R.0.
Note that Wn.0 is coherence-after Wi.0 (coherence order is a total order on writes, so one must come after the other, and we are assuming Wi took place very long ago, with sufficient sequencing or synchronization in between). So if R.0 reads-from Wi.0, we then have that Wn.0 is coherence-after R.0 (definition of coherence-after, second sentence). We just argued that is not the case, so R.0 does not read-from Wi.0; it must read-from Wn.0 and therefore return 0xFF. ∎
Note that on x86, ordinary loads and stores implicitly come with acquire and release ordering respectively, and this is not true on ARM64. You have to use ldar / stlr for that.

Multidriven nets: Synthesis ok, Simulation fails

I have a fundamental understanding problem with System Verilog. I am working on a processor design, where some bus systems should be shared between several processing units (System Verilog modules). With an arbiter only one module at a time should be active, driving the bus, while all other are high impedance.
I got rid of the multidriven nets warnings in Vivado during synthesis and there are not anymore any bus conflicts, but the simulator gives a warning, that the bus signals 'might' be multidriven. I made a tiny example code and I would expect to get for 'data' '11', when 'select' is '10'?
While simulation stops at all in Vivado, it works with Cadence simulator, but with wrong results - screenshot simulation
testbench.sv
`timescale 1ns / 1ps
module testbench_top();
logic [1:0] select;
logic [1:0] data;
top top_inst(.*);
initial
begin
select = 0;
#2 select = 1;
#2 select = 2;
#2 select = 0;;
end
initial
begin
$monitor("t=%3d s=%b,d=%b\n",$time,select,data);
end
endmodule
design.sv
`timescale 1ns / 1ps
module top
(
input logic [1:0] select,
output logic [1:0] data
);
driver_1 driver_1_inst(.*);
driver_2 driver_2_inst(.*);
endmodule
module driver_1
(
input logic [1:0] select,
output logic [1:0] data
);
always_comb
begin
if (select == 2'b10)
data = 2'b11;
else
data = 'z;
end
endmodule
module driver_2
(
input logic [1:0] select,
output logic [1:0] data
);
always_comb
begin
if (select == 2'b01)
data = 2'b01;
else
data = 'z;
end
endmodule
I'm assuming you expect the value of data signal the top module, which is driven by the two outputs of your driver modules, to be resolved (e.g. when one drive 'z, the other gets the bus.
This will happen if you declare the top.data signal as output wire logic [1:0] data.
Section 23.2.2.3 Rules for determining port kind, data type, and direction of the IEEE 1800-2012 standard states that
For output ports, the default port kind depends on how the data type
is specified: — If the data type is omitted or declared with the
implicit_data_type syntax, the port kind shall default to a net of
default net type. — If the data type is declared with the explicit
data_type syntax, the port kind shall default to variable.
In your case, the second clause applies, since you declared data as output logic[1:0], meaning that it was interpreted as a variable and not a net. Multiple values on variables aren't resolved (and in some tools are also illegal).

Microchip dsPIC33 C30 function pointer size?

The C30 user manual manual states that pointers near and far are 16bits wide.
How then does this address the full code memory space which is 24bits wide?
I am so confused as I have an assembler function (called from C) returning the program counter (from the stack) where a trap error occurred. I am pretty sure it sets w1 and w0 before returning.
In C, the return value is defined as a function pointer:
void (*errLoc)(void);
and the call is:
errLoc = getErrLoc();
When I now look at errLoc, it is a 16 bit value and I just do not think that is right. Or is it? Can function pointers (or any pointers) not access the full code address space?
All this has to do with a TRAP Adress error I am trying to figure out for the past 48 hours.
I see you are trying to use the dsPIC33Fxxxx/PIC24Hxxxx fault interrupt trap example code.
The problem is that pointer size for dsPIC33 (via the MPLAB X C30 compiler) is 16bit wide, but the program counter is 24bits. Fortunately the getErrLoc() assembly function does return the correct size.
However the example C source code function signature provided is void (*getErrLoc(void))(void) which is incorrect as it will be treating the return values as if it was a 16bit pointer. You want to change the return type of the function signature to be large enough to store the 24bits program counter value below instead. Thus if you choose unsigned long integer as the return type of getErrLoc(), then it will be large enough to store the 24bit program counter into a 32bit unsigned long integer location.
unsigned long getErrLoc(void); // Get Address Error Loc
unsigned long errLoc __attribute__((persistent));
(FYI: Using __attribute__((persistent)) to record trap location on next reboot)

Does Swift's UnsafeMutablePointer<Float>.allocate(...) actually allocate memory?

I'm trying to understand Swift's unsafe pointer API for the purpose of manipulating audio samples.
The non-mutable pointer variants (UnsafePointer, UnsafeRawPointer, UnsafeBufferPointer) make sense to me, they are all used to reference previously allocated regions of memory on a read-only basis. There is no type method "allocate" for these variants
The mutable variants (UnsafeMutablePointer, UnsafeMutableRawPointer), however, are documented as actually allocating the underlying memory. Example from the documentation for UnsafeMutablePointer (here):
static func allocate(capacity: Int)
Allocates uninitialized memory for the specified number of instances of type Pointee
However, there is no mention that the UnsafeMutablePointer.allocate(size) can fail so it cannot be actually allocating memory. Conversely, if it does allocate actual memory, how can you tell if it failed?
Any insights would be appreciated.
I decided to test this. I ran this program in CodeRunner:
import Foundation
sleep(10)
While the sleep function was executing, CodeRunner reported that this was taking 5.6 MB of RAM on my machine, making our baseline.
I then tried this program:
import Foundation
for _ in 0..<1000000 {
let ptr = UnsafeMutablePointer<Float>.allocate(capacity: 1)
}
sleep(10)
Now, CodeRunner reports 5.8 MB of RAM usage. A little more than before, but certainly not the extra 4 MB that this should have taken up.
Finally, I assigned something to the pointer:
import Foundation
for _ in 0..<1000000 {
let ptr = UnsafeMutablePointer<Float>.allocate(capacity: 1)
ptr.pointee = 0
}
sleep(10)
Suddenly, the program is taking up 21.5 MB of RAM, finally giving us our expected RAM usage increase, although by a larger amount than what I was expecting.
Making a profile in CodeRunner to compile with the optimizations turned on did not seem to make a difference in the behavior I was seeing.
So, surprisingly enough, it does appear that the call to UnsafeMutablePointer.allocate actually does not immediately allocate memory.
Operating systems can cheat a lot when it comes to memory allocations. If you request a block of memory of size N and don't actually put anything in it, the operating system can very well go "sure you can have a block of memory, here you go" and not really do anything with it. It's really more a promise that the memory will be available when used by the program.
Even with a very simple C program like the one below, the macOS's Activity Monitor will report 945 kB first, then 961 kB after calling malloc (which allocates the memory), and finally 257.1 MB after filling the allocated memory with zeroes.
From the point of view of the program, all 256 MB needed for the array of integers is available immediately after calling malloc, but that's actually a lie.
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char const *argv[])
{
int count = 64*1024*1024;
printf("Step 1: No memory allocated yet. Check memory usage for baseline, then press enter to continue (1/3)");
getchar();
/* Allocate big block of memory */
int *p = malloc(count*sizeof(int));
if (p == NULL) return 1; // failed to allocate
printf("Step 2: Memory allocated. Check memory usage, then press any key to continue (2/3)");
getchar();
/* Fill with zeroes */
for (int i=0; i < count; i++) {
p[i] = 0;
}
printf("Step 3: Memory filled with zeroes. Check memory usage, then press any key to continue (3/3)");
getchar();
return 0;
}

Why does this code cause "EXC_BAD_INSTRUCTION"?

dispatch_semaphore_t aSemaphore = dispatch_semaphore_create(1);
dispatch_semaphore_wait(aSemaphore, DISPATCH_TIME_FOREVER);
dispatch_release(aSemaphore);
When the program runs to dispatch_release(aSemaphore), it will cause "EXC_BAD_INSTRUCTION", and then crash. Why?
I tried this code and it does indeed die with illegal instruction. So I did some digging and found that it's dying in _dispatch_semaphore_dispose. So let's look at what that is (ARMv7 here, because it's easy to understand!):
__dispatch_semaphore_dispose:
000040a0 b590 push {r4, r7, lr}
000040a2 4604 mov r4, r0
000040a4 af01 add r7, sp, #4
000040a6 e9d40108 ldrd r0, r1, [r4, #32]
000040aa 4288 cmp r0, r1
000040ac da00 bge.n 0x40b0
000040ae defe trap
...
It dies at 0x40ae, which is a duff instruction put there so that it crashes if the bge.n doesn't make us branch to jump over it.
The reason it's failing is because r0 must be less than r1. r0 and r1 are loaded from the memory at r4 + 32 which having gone back up the stack to figure it out I think r4 is aSemaphore in the example code, i.e. the thing passed into dispatch_semaphore_release. The + 32 signifies it is reading 32 bytes into the struct that aSemaphore is pointing to (it's a pointer to a dispatch_semaphore_s struct). So overall what it's doing it reading 4 bytes from aSemaphore + 32 and putting them into r0 and reading 4 bytes from aSemaphore + 36 and putting them into r1.
The compare is then effectively comparing the value of aSemaphore + 32 and aSemaphore + 36. Reading what dispatch_semaphore_create does I can see that it stores the value passed in to both aSemaphore + 32 and aSemaphore + 36. I also found that dispatch_semaphore_wait and dispatch_semaphore_signal touch the value at aSemaphore + 32, to increment and decrement it. This means that the reason it's breaking is because the current value of the semaphore is less than the value passed into dispatch_semaphore_create. So you can't dispose of a semaphore when the current value is less than the value it was created with.
If you've read to here and understood my ramblings then well done! Hope it helps!
UPDATE:
It's probably better to look at the source (pointed out by JustSid) here - http://opensource.apple.com/source/libdispatch/libdispatch-187.7/src/semaphore.c - looking at the _dispatch_semaphore_dispose function we see:
if (dsema->dsema_value < dsema->dsema_orig) {
DISPATCH_CLIENT_CRASH("Semaphore/group object deallocated while in use");
}
So, yes, there you go, that's why it crashes!
Somewhat more succinct answer: You are creating the semaphore with the wrong value, it should be zero. Creating it with a value of 1 means you are later releasing a semaphore that's still "in use" and GCD is deliberately generating an illegal instruction in order to help you debug the fact that you have a semaphore with more waiters on it.
You can create a semaphore with zero value, but I believe it will be just useless. I had a semaphore field in a class which caused it to crash at deinitialisation. This is how I fixed it (Swift code):
deinit {
while (dispatch_semaphore_signal(semaphore) != 0) {}
}
A rather awkward patch, but it works!