Is Unity endian-ness platform independent? - unity3d

I was just wondering if I can reliably expect C# code to be little endian in Unity.
I'm using an int as a bitmap to determine the state of a room where there are four statues. Each statue can have its arms up or down. I use 8 bits to represent the arms. 1 == up 0 == down.
int bit = (int)statueNumber * 2;
if (!isLeftArm) bit += 1;
bool up = (1 == ((roomState >> bit) & 1));
This tells me if an arm is up or down. Eventually I compare "roomState" to another integer represented the "correct" room state. Let's say the correct state is 1010101, then the answer is 85 in little endian. But if it's interpreted as big endian it's another number.

Related

Why are the hex numbers for big endian different than little endian?

#include<stdio.h>
int main()
{
typedef unsigned char *byte_pointer;
void show_bytes(byte_pointer start, size_t len)
{
int i;
for (i = 0; i < len; i++)
{
printf(" %.2x", start[i]);
printf("\n");
}
}
void show_int(int x)
{
show_bytes((byte_pointer) &x, sizeof(int));
}
void show_float(int x)
{
show_bytes((byte_pointer) &x, sizeof(float));
}
void show_pointer(int x)
{
show_bytes((byte_pointer) &x, sizeof(void *));
}
int a = 0x12345678;
byte_pointer ap = (byte_pointer) &a;
show_bytes(ap, 3);
return 0;
}
(Solutions according to the CS:APP book)
Big endian: 12 34 56
Little endian: 78 56 34
I know systems have different conventions for storage allocation but if two systems use the same convention but are different endian why are the hex values different?
Endian-ness is an issue that arises when we use more than one storage location for a value/type, which we do because somethings won't fit in a single storage location.
As soon as we use multiple storage locations for a single value that gives rise to the question of:  What part of the value will we store in each storage location?
The first byte of a two-byte item will have a lower address than the second byte, and in particular, the address of the second byte will be at +1 from the address of the lower byte.
Storing a two-byte item in two bytes of storage, do we store the most significant byte first and the least significant byte second, or vice versa?
We choose to use directly consecutive bytes for the two bytes of the two-byte item, so no matter which (endian) way we choose to store such an item, we refer to the whole two-byte item by the lower address (the address of its first byte).
We can express these storage choices with a formula, here item[0] refer to the first byte while item[1] refers to the second byte.
item[0] = value >> 8 // also value / 256
item[1] = value & 0xFF // also value % 256
value = (item[0]<<8) | item[1] // also item[0]*256 | item[1]
--vs--
item[0] = value & 0xFF // also value % 256
item[1] = value >> 8 // also value / 256
value = item[0] | (item[1]<<8) // also item[0] | item[1]*256
The first set of formulas is for big endian, and the second for little endian.
By these formulas, it doesn't matter what order we access memory as to whether item[0] first, then item[1], or vice versa, or both at the same time (common in hardware), as long as the formulas for one endian are consistently used.
If the item in question is a four-byte value, then there are 4 possible orderings(!) — though only two of them are truly sensible.
For efficiency, the hardware offers us multibyte memory access in one instruction (and with one reference, namely to the lowest address of the multibyte item), and therefore, the hardware itself needs to define and consistently use one of the two possible/reasonable orderings.
If the hardware did not offer multibyte memory access, then the ordering would be entirely up to the software program itself to define (accessing memory one byte at a time), and the program could choose big or little endian, even differently for each variable, as long as it consistently accesses the multiple bytes of memory in the same manner to reassemble the values stored there.
In a similar manner, when we define a structure of multiple items (e.g. struct point { int x; int y; }, software chooses whether x comes first or y comes first in memory ordering.  However, since programmers (and compilers) will still choose to use hardware instructions to access individual fields such as x in one go, the hardware's endian configuration remains necessary.

Reducing LUT utilization in a Vivado HLS design (RSA cryptosystem using montgomery multiplication)

A question/problem for anyone experienced with Xilinx Vivado HLS and FPGA design:
I need help reducing the utilization numbers of a design within the confines of HLS (i.e. can't just redo the design in an HDL). I am targeting the Zedboard (Zynq 7020).
I'm trying to implement 2048-bit RSA in HLS, using the Tenca-koc multiple-word radix 2 montgomery multiplication algorithm, shown below (More algorithm details here):
I wrote this algorithm in HLS and it works in simulation and in C/RTL cosim. My algorithm is here:
#define MWR2MM_m 2048 // Bit-length of operands
#define MWR2MM_w 8 // word size
#define MWR2MM_e 257 // number of words per operand
// Type definitions
typedef ap_uint<1> bit_t; // 1-bit scan
typedef ap_uint< MWR2MM_w > word_t; // 8-bit words
typedef ap_uint< MWR2MM_m > rsaSize_t; // m-bit operand size
/*
* Multiple-word radix 2 montgomery multiplication using carry-propagate adder
*/
void mwr2mm_cpa(rsaSize_t X, rsaSize_t Yin, rsaSize_t Min, rsaSize_t* out)
{
// extend operands to 2 extra words of 0
ap_uint<MWR2MM_m + 2*MWR2MM_w> Y = Yin;
ap_uint<MWR2MM_m + 2*MWR2MM_w> M = Min;
ap_uint<MWR2MM_m + 2*MWR2MM_w> S = 0;
ap_uint<2> C = 0; // two carry bits
bit_t qi = 0; // an intermediate result bit
// Store concatenations in a temporary variable to eliminate HLS compiler warnings about shift count
ap_uint<MWR2MM_w> temp_concat=0;
// scan X bit-by bit
for (int i=0; i<MWR2MM_m; i++)
{
qi = (X[i]*Y[0]) xor S[0];
// C gets top two bits of temp_concat, j'th word of S gets bottom 8 bits of temp_concat
temp_concat = X[i]*Y.range(MWR2MM_w-1,0) + qi*M.range(MWR2MM_w-1,0) + S.range(MWR2MM_w-1,0);
C = temp_concat.range(9,8);
S.range(MWR2MM_w-1,0) = temp_concat.range(7,0);
// scan Y and M word-by word, for each bit of X
for (int j=1; j<=MWR2MM_e; j++)
{
temp_concat = C + X[i]*Y.range(MWR2MM_w*j+(MWR2MM_w-1), MWR2MM_w*j) + qi*M.range(MWR2MM_w*j+(MWR2MM_w-1), MWR2MM_w*j) + S.range(MWR2MM_w*j+(MWR2MM_w-1), MWR2MM_w*j);
C = temp_concat.range(9,8);
S.range(MWR2MM_w*j+(MWR2MM_w-1), MWR2MM_w*j) = temp_concat.range(7,0);
S.range(MWR2MM_w*(j-1)+(MWR2MM_w-1), MWR2MM_w*(j-1)) = (S.bit(MWR2MM_w*j), S.range( MWR2MM_w*(j-1)+(MWR2MM_w-1), MWR2MM_w*(j-1)+1));
}
S.range(S.length()-1, S.length()-MWR2MM_w) = 0;
C=0;
}
// if final partial sum is greater than the modulus, bring it back to proper range
if (S >= M)
S -= M;
*out = S;
}
Unfortunately, the LUT utilization is huge.
This is problematic because I need to be able to fit multiple of these blocks in hardware as axi4-lite slaves.
Could someone please provide a few suggestions as to how I can reduce the LUT utilization, WITHIN THE CONFINES OF HLS?
I've already tried the following:
Experimenting with different word lengths
switching the top level inputs to arrays so they are BRAM (i.e. not using ap_uint<2048>, but instead ap_uint foo[MWR2MM_e])
Experimenting with all sorts of directives: compartmentalizing into multiple inline functions, dataflow architecture, resource limits on lshr, etc.
However, nothing really drives the LUT utilization down in a meaningful way. Is there a glaringly obvious way that I could reduce the utilization that is apparent to anyone?
In particular, I've seen papers on implementations of the mwr2mm algorithm that (only use one DSP block and one BRAM). Is this even worth attempting to implement using HLS? Or is there no way that I can actually control the resources that the algorithm is mapped to without describing it in HDL?
Thanks for the help.

How can I extract a specific bit from a 16-bit register using math ONLY?

I have a 16-bit WORD and I want to read the status of a specific bit or several bits.
I've tried a method that divides the word by the bit that I want, converts the result to two values - an integer and to a real, and compares the two. if they are not equal, then it it equates to false. This appears to only work if i am looking for a bit that the last 'TRUE' bit in the word. If there are any successive TRUE bits, it fails. Perhaps I just haven't done it right. I don't have the ability to use code, just basic math, boolean operations, and type conversion. Any ideas? I hope this isn't a dumb question but i have a feeling it is.
eg:
WORD 0010000100100100 = 9348
I want to know the value of bit 2. how can i determine it from 9348?
There are many ways, depending on what operations you can use. It appears you don't have much to choose from. But this should work, using just integer division and multiplication, and a test for equality.
(psuedocode):
x = 9348 (binary 0010000100100100, bit 0 = 0, bit 1 = 0, bit 2 = 1, ...)
x = x / 4 (now x is 1000010010010000
y = (x / 2) * 2 (y is 0000010010010000)
if (x == y) {
(bit 2 must have been 0)
} else {
(bit 2 must have been 1)
}
Every time you divide by 2, you move the bits to the left one position (in your big endian representation). Every time you multiply by 2, you move the bits to the right one position. Odd numbers will have 1 in the least significant position. Even numbers will have 0 in the least significant position. If you divide an odd number by 2 in integer math, and then multiply by 2, you loose the odd bit if there was one. So the idea above is to first move the bit you want to know about into the least significant position. Then, divide by 2 and then multiply by two. If the result is the same as what you had before, then there must have been a 0 in the bit you care about. If the result is not the same as what you had before, then there must have been a 1 in the bit you care about.
Having explained the idea, we can simplify to
((x / 8) * 2) <> (x / 4)
which will resolve to true if the bit was set, and false if the bit was not set.
AND the word with a mask [1].
In your example, you're interested in the second bit, so the mask (in binary) is
00000010. (Which is 2 in decimal.)
In binary, your word 9348 is 0010010010000100 [2]
0010010010000100 (your word)
AND 0000000000000010 (mask)
----------------
0000000000000000 (result of ANDing your word and the mask)
Because the value is equal to zero, the bit is not set. If it were different to zero, the bit was set.
This technique works for extracting one bit at a time. You can however use it repeatedly with different masks if you're interested in extracting multiple bits.
[1] For more information on masking techniques see http://en.wikipedia.org/wiki/Mask_(computing)
[2] See http://www.binaryhexconverter.com/decimal-to-binary-converter
The nth bit is equal to the word divided by 2^n mod 2
I think you'll have to test each bit, 0 through 15 inclusive.
You could try 9348 AND 4 (equivalent of 1<<2 - index of the bit you wanted)
9348 AND 4
should give 4 if bit is set, 0 if not.
So here is what I have come up with: 3 solutions. One is Hatchet's as proposed above, and his answer helped me immensely with actually understanding HOW this works, which is of utmost importance to me! The proposed AND masking solutions could have worked if my system supports bitwise operators, but it apparently does not.
Original technique:
( ( ( INT ( TAG / BIT ) ) / 2 ) - ( INT ( ( INT ( TAG / BIT ) ) / 2 ) ) <> 0 )
Explanation:
in the first part of the equation, integer division is performed on TAG/BIT, then REAL division by 2. In the second part, integer division is performed TAG/BIT, then integer division again by 2. The difference between these two results is compared to 0. If the difference is not 0, then the formula resolves to TRUE, which means the specified bit is also TRUE.
eg: 9348/4 = 2337 w/ integer division. Then 2337/2 = 1168.5 w/ REAL division but 1168 w/ integer division. 1168.5-1168 <> 0, so the result is TRUE.
My modified technique:
( INT ( TAG / BIT ) / 2 ) <> ( INT ( INT ( TAG / BIT ) / 2 ) )
Explanation:
effectively the same as above, but instead of subtracting the two results and comparing them to 0, I am just comparing the two results themselves. If they are not equal, the formula resolves to TRUE, which means the specified bit is also TRUE.
eg: 9348/4 = 2337 w/ integer division. Then 2337/2 = 1168.5 w/ REAL division but 1168 w/ integer division. 1168.5 <> 1168, so the result is TRUE.
Hatchet's technique as it applies to my system:
( INT ( TAG / BIT )) <> ( INT ( INT ( TAG / BIT ) / 2 ) * 2 )
Explanation:
in the first part of the equation, integer division is performed on TAG/BIT. In the second part, integer division is performed TAG/BIT, then integer division again by 2, then multiplication by 2. The two results are compared. If they are not equal, the formula resolves to TRUE, which means the specified bit is also TRUE.
eg: 9348/4 = 2337. Then 2337/2 = 1168 w/ integer division. Then 1168x2=2336. 2337 <> 2336 so the result is TRUE. As Hatchet stated, this method 'drops the odd bit'.
Note - 9348/4 = 2337 w/ both REAL and integer division, but it is important that these parts of the formula use integer division and not REAL division (12164/32 = 380 w/ integer division and 380.125 w/ REAL division)
I feel it important to note for any future readers that the BIT value in the equations above is not the bit number, but the actual value of the resulting decimal if the bit in the desired position was the only TRUE bit in the binary string (bit 2 = 4 (2^2), bit 6 = 64 (2^6))
This explanation may be a bit too verbatim for some, but may be perfect for others :)
Please feel free to comment/critique/correct me if necessary!
I just needed to resolve an integer status code to a bit state in order to interface with some hardware. Here's a method that works for me:
private bool resolveBitState(int value, int bitNumber)
{
return (value & (1 << (bitNumber - 1))) != 0;
}
I like it, because it's non-iterative, requires no cast operations and essentially translates directly to machine code operations like Shift, And and Comparison, which probably means it's really optimal.
To explain in a little more detail, I'm comparing the bitwise value to a mask for the bit I am interested in (value & mask) using an AND operation. If the bitwise AND operation result is zero, then the bit is not set (return false). If the AND operation result is not zero, then the bit is set (return true). The result of the AND operation is either zero or the value of the bit (1, 2, 4, 8, 16, 32...). Hence the boolean evaluation comparing the AND operation result and 0. The mask is created by taking the number 1 and shifting it left (bit wise), by the appropriate number of binary places (1 << n). The number of places is the number of the bit targeted minus 1. If it's bit #1, I want to shift the 1 left by 0 and if it's #2, I want to shift it left 1 place, etc.
I'm surprised no one rates my solution. It think it's most logical and succinct... and works.

Choosing values for constants

One thing I've never really understood is why in many libraries, constants are defined like this:
public static final int DM_FILL_BACKGROUND = 0x2;
public static final int DM_FILL_PREVIOUS = 0x3;
public static final int TRANSPARENCY_MASK = 1 << 1;
public static final int TRANSPARENCY_PIXEL = 1 << 2;
What's up with the 0x and << stuff? Why aren't people just using ordinary integer values?
The bit shifting of 1 is usually for situations where you have non-exclusive values that you want to store.
For example, say you want to be able to draw lines on any side of a box. You define:
LEFT_SIDE = 1 << 0 # binary 0001 (1)
RIGHT_SIDE = 1 << 1 # binary 0010 (2)
TOP_SIDE = 1 << 2 # binary 0100 (4)
BOTTOM_SIDE = 1 << 3 # binary 1000 (8)
----
0111 (7) = LEFT_SIDE | RIGHT_SIDE | TOP_SIDE
Then you can combine them for multiple sides:
DrawBox (LEFT_SIDE | RIGHT_SIDE | TOP_SIDE) # Don't draw line on bottom.
The fact that they're using totally different bits means that they're independent of each other. By ORing them you get 1 | 2 | 4 which is equal to 7 and you can detect each individual bit with other boolean operations (see here and here for an explanation of these).
If they were defined as 1, 2, 3 and 4 then you'd probably either have to make one call for each side or you'd have to pass four different parameters, one per side. Otherwise you couldn't tell the difference between LEFT and RIGHT (1 + 2 = 3) and TOP (3), since both of them would be the same value (with a simple addition operation).
The 0x stuff is just hexadecimal numbers which are easier to see as binary bitmasks (each hexadecimal digit corresponds exactly with four binary digits. You'll tend to see patterns like 0x01, 0x02, 0x04, 0x08, 0x10, 0x20 and so on, since they're the equivalent of a single 1 bit moving towards the most significant bit position - those values are equivalent to binary 00000001, 00000010, 00000100, 00001000, 00010000, 00100000 and so on.
Aside: Once you get used to hex, you rarely have to worry about the 1 << n stuff. You can instantly recognise 0x4000 as binary 0100 0000 0000 0000. That's less obvious if you see the value 16384 in the code although some of us even recognise that :-)
Regarding << stuff: this in my preferred way.
When I need to define the constant with 1 in the bit 2 position, and 0 in all other bits, I can define it as 4, 0x4 or 1<<2. 1<<2 is more readable, to my opinion, and explains exactly the purpose of this constant.
BTW, all these ways give the same performance, since calculations are done at compile time.

hash function providing unique uint from an integer coordinate pair

The problem in general:
I have a big 2d point space, sparsely populated with dots.
Think of it as a big white canvas sprinkled with black dots.
I have to iterate over and search through these dots a lot.
The Canvas (point space) can be huge, bordering on the limits
of int and its size is unknown before setting points in there.
That brought me to the idea of hashing:
Ideal:
I need a hash function taking a 2D point, returning a unique uint32.
So that no collisions can occur. You can assume that the number of
dots on the Canvas is easily countable by uint32.
IMPORTANT: It is impossible to know the size of the canvas beforehand
(it may even change),
so things like
canvaswidth * y + x
are sadly out of the question.
I also tried a very naive
abs(x) + abs(y)
but that produces too many collisions.
Compromise:
A hash function that provides keys with a very low probability of collision.
Cantor's enumeration of pairs
n = ((x + y)*(x + y + 1)/2) + y
might be interesting, as it's closest to your original canvaswidth * y + x but will work for any x or y. But for a real world int32 hash, rather than a mapping of pairs of integers to integers, you're probably better off with a bit manipulation such as Bob Jenkin's mix and calling that with x,y and a salt.
a hash function that is GUARANTEED collision-free is not a hash function :)
Instead of using a hash function, you could consider using binary space partition trees (BSPs) or XY-trees (closely related).
If you want to hash two uint32's into one uint32, do not use things like Y & 0xFFFF because that discards half of the bits. Do something like
(x * 0x1f1f1f1f) ^ y
(you need to transform one of the variables first to make sure the hash function is not commutative)
Like Emil, but handles 16-bit overflows in x in a way that produces fewer collisions, and takes fewer instructions to compute:
hash = ( y << 16 ) ^ x;
You can recursively divide your XY plane into cells, then divide these cells into sub-cells, etc.
Gustavo Niemeyer invented in 2008 his Geohash geocoding system.
Amazon's open source Geo Library computes the hash for any longitude-latitude coordinate. The resulting Geohash value is a 63 bit number. The probability of collision depends of the hash's resolution: if two objects are closer than the intrinsic resolution, the calculated hash will be identical.
Read more:
https://en.wikipedia.org/wiki/Geohash
https://aws.amazon.com/fr/blogs/mobile/geo-library-for-amazon-dynamodb-part-1-table-structure/
https://github.com/awslabs/dynamodb-geo
Your "ideal" is impossible.
You want a mapping (x, y) -> i where x, y, and i are all 32-bit quantities, which is guaranteed not to generate duplicate values of i.
Here's why: suppose there is a function hash() so that hash(x, y) gives different integer values. There are 2^32 (about 4 billion) values for x, and 2^32 values of y. So hash(x, y) has 2^64 (about 16 million trillion) possible results. But there are only 2^32 possible values in a 32-bit int, so the result of hash() won't fit in a 32-bit int.
See also http://en.wikipedia.org/wiki/Counting_argument
Generally, you should always design your data structures to deal with collisions. (Unless your hashes are very long (at least 128 bit), very good (use cryptographic hash functions), and you're feeling lucky).
Perhaps?
hash = ((y & 0xFFFF) << 16) | (x & 0xFFFF);
Works as long as x and y can be stored as 16 bit integers. No idea about how many collisions this causes for larger integers, though. One idea might be to still use this scheme but combine it with a compression scheme, such as taking the modulus of 2^16.
If you can do a = ((y & 0xffff) << 16) | (x & 0xffff) then you could afterward apply a reversible 32-bit mix to a, such as Thomas Wang's
uint32_t hash( uint32_t a)
a = (a ^ 61) ^ (a >> 16);
a = a + (a << 3);
a = a ^ (a >> 4);
a = a * 0x27d4eb2d;
a = a ^ (a >> 15);
return a;
}
That way you get a random-looking result rather than high bits from one dimension and low bits from the other.
You can do
a >= b ? a * a + a + b : a + b * b
taken from here.
That works for points in positive plane. If your coordinates can be in negative axis too, then you will have to do:
A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
A >= B ? A * A + A + B : A + B * B;
But to restrict the output to uint you will have to keep an upper bound for your inputs. and if so, then it turns out that you know the bounds. In other words in programming its impractical to write a function without having an idea on the integer type your inputs and output can be and if so there definitely will be a lower bound and upper bound for every integer type.
public uint GetHashCode(whatever a, whatever b)
{
if (a > ushort.MaxValue || b > ushort.MaxValue ||
a < ushort.MinValue || b < ushort.MinValue)
{
throw new ArgumentOutOfRangeException();
}
return (uint)(a * short.MaxValue + b); //very good space/speed efficiency
//or whatever your function is.
}
If you want output to be strictly uint for unknown range of inputs, then there will be reasonable amount of collisions depending upon that range. What I would suggest is to have a function that can overflow but unchecked. Emil's solution is great, in C#:
return unchecked((uint)((a & 0xffff) << 16 | (b & 0xffff)));
See Mapping two integers to one, in a unique and deterministic way for a plethora of options..
According to your use case, it might be possible to use a Quadtree and replace points with the string of branch names. It is actually a sparse representation for points and will need a custom Quadtree structure that extends the canvas by adding branches when you add points off the canvas but it avoids collisions and you'll have benefits like quick nearest neighbor searches.
If you're already using languages or platforms that all objects (even primitive ones like integers) has built-in hash functions implemented (Java platform Languages like Java, .NET platform languages like C#. And others like Python, Ruby, etc ).
You may use built-in hashing values as a building block and add your "hashing flavor" in to the mix. Like:
// C# code snippet
public class SomeVerySimplePoint {
public int X;
public int Y;
public override int GetHashCode() {
return ( Y.GetHashCode() << 16 ) ^ X.GetHashCode();
}
}
And also having test cases like "predefined million point set" running against each possible hash generating algorithm comparison for different aspects like, computation time, memory required, key collision count, and edge cases (too big or too small values) may be handy.
the Fibonacci hash works very well for integer pairs
multiplier 0x9E3779B9
other word sizes 1/phi = (sqrt(5)-1)/2 * 2^w round to odd
a1 + a2*multiplier
this will give very different values for close together pairs
I do not know about the result with all pairs