I'm using an open-source networking framework that makes it easy for developers to communicate over a service found using Bonjour in Objective-C.
There are a few lines that have had me on edge for a while now, even though they never seem to have caused any problems on any machines I've tested, regardless of whether I'm running the 32-bit of 64-bit version of my application:
int packetLength = [rawPacketData length];
[outgoingBuffer appendBytes:&packetLength length:sizeof(int)];
[outgoingBuffer appendData:rawPacketData];
[self writeToStream];
Note that the first piece of information sent is the length of the data packet, which is pretty standard, and then the data itself is sent. What scares me is the length of the length. Will one machine ever assume an int is 4 bytes, while the other machine believes an int to be 8 bytes?
If the two sizes could be different on different machines, what would cause this? Is it dependent on my compiler, or the end-user's machine architecture? And finally, if it is a problem, how can I take an 8-byte int and scrunch it down to 4-bytes to ensure backwards compatibility? (Since I'll never need more than 4 bytes to represent the size of the data packet.)
You can't assume that sizeof(int) will always be four bytes. If the size matters, you should either hard-code a size of 4 (and write code to serialize values into four-byte arrays with the proper endianness), or use types like int32_t defined in <stdint.h>.
(However, as a practical matter, most compiler vendors have decided that int should stay four bytes, so you probably don't need to worry about everything breaking tomorrow. Then again, it wasn't so long ago that many compiler vendors let an int be two bytes, leading to many problems when ints became four bytes, so you really ought to do things the right way so guard against future changes.)
It could be different, but this depends on the compiler more than the machine. A different compiler might redeclare int to be 8 bytes.
Size of int depends on machine architecture; size of int will be the size of the data bus almost always, unless your C compiler does something special and changes it.
That means size of int is not 4 bytes when you compile your program in a 8-bit, 16-bit or 64-bit machine/architecture.
I would define a constant for the buffer size instead of using size of int.
Hope this answers your question.
Related
I am currently refactoring a large chunk of old code and have finally dove into the HLSL section where my knowledge is minimal due to being out of practice. I've come across some documentation online that specifies which registers are to be used for which purposes:
t – for shader resource views (SRV)
s – for samplers
u – for unordered access views (UAV)
b – for constant buffer views (CBV)
This part is pretty self explanatory. If I want to create a constant buffer, I can just declare as:
cbuffer LightBuffer: register(b0) { };
cbuffer CameraBuffer: register(b1) { };
cbuffer MaterialBuffer: register(b2) { };
cbuffer ViewBuffer: register(b3) { };
However, originating from the world of MIPS Assembly I can't help but wonder if there are finite and restricted ranges on these. For example, temporary registers are restricted to a range of t0 - t7 in MIPS Assembly. In the case of HLSL I haven't been able to find any documentation surrounding this topic as everything seems to point to assembly languages and microprocessors (such as the 8051 if you'd like a random topic to read up on).
Is there a set range for the four register types in HLSL or do I just continue as much as needed in a sequential fashion and let the underlying assembly handle the messy details?
Note
I have answered this question partially, as I am unable to find a range for u currently; however, if someone has a better, more detailed answer than what I've given through testing, then feel free to post it and I will mark that as the correct answer. I will leave this question open until December 1st, 2018 to give others a chance to give a better answer for future readers.
Resource slot count (for d3d11, indeed d3d12 case expands that) are specified in Resource Limit msdn page.
The ones which are of interest for you here are :
D3D11_COMMONSHADER_INPUT_RESOURCE_REGISTER_COUNT (which is t) = 128
D3D11_COMMONSHADER_SAMPLER_SLOT_COUNT (which is s) = 16
D3D11_COMMONSHADER_CONSTANT_BUFFER_HW_SLOT_COUNT (which is b) = 15 but one is reserved to eventually store some constant data from shaders (if you have a static const large array for example)
The u case is different, as it depends on Feature Level (and tbh is a vendor/os version mess) :
D3D11_FEATURE_LEVEL_11_1 or greater, this is 64 slots
D3D11_FEATURE_LEVEL_11 : It will always be 8 (but some cards/driver eventually support 64, you need at least windows 8 for it (It might also be available in windows 7 with some platform update too). I do not recall a way to test if 64 is supported (many nvidia in their 700 range do for example).
D3D11_FEATURE_LEVEL_10_1 : either 0 or 1, there's a way to check is compute is supported
You need to perform a feature check:
D3D11_FEATURE_DATA_D3D10_X_HARDWARE_OPTIONS checkData;
d3dDevice->CheckFeatureSupport(D3D11_FEATURE_D3D10_X_HARDWARE_OPTIONS, &checkData);
BOOL computeSupport = checkData.ComputeShaders_Plus_RawAndStructuredBuffers_Via_Shader_4_x
Please note that for some OS/Driver version I had this flag returning TRUE while not supported (Intel was doing that on win7/8), so in that case the only valid solution was to try to either create a small Raw / Byte Address buffer or a Structured Buffer and check the HRESULT
As a side note feature feature level 10 or below are for for quite old configurations nowadays, so except for rare scenarios you can probably safely ignore it (I just leave it for information purpose).
Since it's usually a long wait time for these types of questions I tested the b register by attempting to create a cbuffer in register b51. This failed as I expected and luckily SharpDX spit out an exception that stated it has a maximum of 14. So for the sake of future readers I am testing all four register types and posting back the ranges I find successful.
b has a range of b0 - b13.
s has a range of s0 - s15.
t has a range of t0 - t127.
u has a range of .
At the current moment, I am unable to find a range for the u register as I have no examples of it in my code, and haven't actually ever used it. If someone comes along that does have an example usage then feel free to test it and update this post for future readers.
I did find a contradiction to my findings above in the documentation linked in my question; they have an example using a t register above the noted range in this answer:
Texture2D a[10000] : register(t0);
Texture2D b[10000] : register(t10000);
ConstantBuffer<myConstants> c[10000] : register(b0);
Note
I would like to point out that I am using the SharpDX version of the HLSL compiler and so I am unsure if these ranges vary from compiler to compiler; I heavily doubt that they do, but you can never be too sure until you try to exceed them. GLSL may be the same due to being similar to HLSL, but it could also be very different.
TL;DR
I'm trying to talk to a Minecraft server with a client written in Scala, using Akka I/O over TCP, and would like to know if Minecraft uses an official, standardised protocol in it's network communication?
Minecraft's own documentation covers the contents of each packet, but fails to explain how the packets themselves are encoded, or even how they should be formed.
A little back story
As part of a personal project that I'm working on, written in Scala, I need to create an interface capable of mocking a Minecraft client, and performing actions against a Minecraft server. After weeks of research, I came across a lot of Java libraries that were almost what I was looking for, but none that quite suited my exact needs; long story short, I did the classic, "Oh well, why not write it myself and enjoy the learning curve"...
The issue
The Minecraft protocol documentation is thorough in some respects, but lacking in others, many assumptions are made throughout and a lot of key information is missing or even incorrect; a detailed network specification being the most notable in my case.
One attempt to talk to the Minecraft server had me playing around with Google's protocol buffers, using ScalaPB to compile them to usable case classes, but the data types were a pain to resolve between Google's own documentation and Minecraft's.
message Handshake {
<type?> protocolVersion = 1;
<type?> host = 2;
<type?> port = 3;
<type?> nextState = 4;
}
The host is a string, so that's an easy win, but both the protocolVersion and nextState are variable integers, which are not encoded as expected when I compared them with valid packets generated by another client with identical contents (I've been using a third-party library to compare the hexadecimal output of encoded packets).
My hacky solution
In a ditch attempt to achieve my goals, I've simply written methods like the one below (this is also a first iteration, so be kind!) to generate the desired encoding for each of the types declared in Minecraft's documentation that are not supported natively in Scala, and although this works, it just smells like I'm missing something potentially obvious that others might know about.
def toVarint(x: Int): Array[Byte] = {
var number = x
var output = ArrayBuffer[Int]()
while (number >= Math.pow(2, 31)) {
output += number & 0xFF | 0x80
number /= 128
}
while ((number & ~0x7F) > 0) {
output += number 0xFF | 0x80
number >>>= 7
}
output += number | 0
output.map(_.toByte).toArray
}
This is old, but I'll answer anyway! The wiki you referenced has (at least now) a section on the packet format (both before & after compression), as well as explanations including example code for how to handle their varint & varlong types!
To summarise, each packet is length prefixed with a varint (in both compression modes), so you just need to read a varint from the stream, allocate that much space & read that many bytes from the stream to the buffer.
Each byte of Minecraft varints have a "another byte to follow?" flag bit, followed by 7 bits of actual data. Where those 7 bits are just the bits from a standard int (so on receive, you essentially just omit those flag bits & write the rest to a standard int type)
I've been reading through this section for a while, but I can't seem to figure it out. I'm on AMD64 ABI Draft 0.99.6, page 18, section 3.32 Parameter Passing and theres the following text:
Arguments of type __m256 are split into four eightbyte chunks. The least significant one belongs to class SSE and all the others to class SSEUP.
I'm confused because it sounds like I use three SSEUP registers and only one SSE, but that seems wasteful of the other two SSE registers associated with the SSEUP. Am I misreading it? I probably won't even use this datatype, but I've been confused on this text for quite a while. Can someone give an example of how this would work? I'm probably missing something obvious.
Page 18 just contains a list of definitions necessary for a later discussion of the algorithm used to pass the parameters of a function.
Particularly, the SSE class is always passed in a new vector register, the first available of %xmm0-%xmm7.
Note that these names refer to the lower 128-bit parts of the registers but its better to think of them in terms of variable size vector registers %v0-%v7.
The SSEUP class is passed in the next available 64-bit (eight-byte) of the last vector register used.
__m256 is then passed, in processors that support AVX, using a single %ymm register: the lower 64 bits get the SSE class - and hence a new %v0 register - while the other three 64 bits chunks get SSEUP thereby reusing the %v0 register.
Here's the relevant quote from the document:
If the class is SSE, the next available vector register is used, the registers
are taken in the order from %xmm0 to %xmm7.
If the class is SSEUP, the eightbyte is passed in the next available eightbyte
chunk of the last used vector register.
The SSEUP class was introduced earlier in the ABI and it is still present today.
You can quickly consult the Version 0.9 to see the differences: the type _m256 and _m512 were not present for example.
For compiler that doesn't support the new ABI with the _m256 type or for compilers that do support it but target processors with no AVX support, that type is usually an aggregate of two _m128 and thus by the rules described later (particularly the post-merge rules) it is passed in memory:
If the size of an object is larger than two eightbytes, or in C++, is a nonPOD structure or union type, or contains unaligned fields, it has class
MEMORY.
For compilers using the old ABI
If the size of the aggregate exceeds two eightbytes and the first eightbyte
isn’t SSE or any other eightbyte isn’t SSEUP, the whole argument
is passed in memory.
For compilers using the new ABI
The standard is admittedly confusing mostly due to the need to address backward compatibility, the SSE and SSEUP classifications are handy classifications in an architecture where the vector registers keep widening and broad range of different sizes are already present out there.
So if you want to look at sync block for an object, under sos you have to look at -4 bytes (on 32 bit machines) before the object address. Does anyone know what is the wisdom for going back 4 bytes? I mean they could have sync block at 0, then type handle at +4 and then object fields at +8.
This is an implementation detail, so I can't give you the exact reason for the placement of the syncblock. However, if you look at the shared source CLI, you'll see that the runtime has all sorts of optimizations for how objects are allocated and used, and actually the data associated with a single instance is located in several different places. The syncblock for instance is just an index value for a structure located elsewhere. Similarly the MethodTable and the EEClass are stored elsewhere. These are all implementation details. The important point IMO is understanding how to dig out the information needed during debugging. It is of much less importance to understand why the implementation details are as they are.
I'd say it matches expectations, especially for structs that have been explicitly laid out. As Brian says, it's just an implementation detail though. It's similar to how many implementations of malloc will allocate more space than requested, store the allocation size in the first four (or eight) bytes, and then return a pointer that is offset to point to the next byte beyond that.
So after researching engines a lot I've been building a 2d framework for the iphone. As you know the world of engine architecture is vast so I've been trying to apply best practices as much as possible.
I've been using:
uint_fast8_t mId;
If I look up the definition of uint_fast8_t I find:
/* 7.18.1.3 Fastest-width integer types */
...
typedef uint8_t uint_fast8_t;
And I've been using these types throughout my code - My question is, is there a performance benefit to using these types? And what exactly is going on behind the scenes? Besides the obvious fact that this is correct data type (unsigned 8 bit integer) for the data, is it worthwhile to have this peppered throughout my code?
Is this a needless optimization that the compiler would probably take care of anyways?
Thanks.
Edit: No responses/answers, so I'm putting a bounty on this!
the "fast" integer types are defined to be the fastest integer type available with at least the amount of bits required (in this case 8).
If your platform defines uint_fast8_t as uint8_t then there will be absolutely no difference in speed.
The reason is that there may be architectures that are slower when not using their native word length. E.g. I could find one reference where for Alpha processors uint_fast_8_t was defined to be "unsigned int".
An uint_fast8_t is the fastest integer guaranteed to be at least 8 bits wide. Depending on your platform it could be 8 or 16 or 32 bits wide.
It isnt taken care of by the compiler itself, it does indeed make your program execute faster
Here are some resource I found, You might already have seen them http://embeddedgurus.com/stack-overflow/2008/06/efficient-c-tips-1-choosing-the-correct-integer-size/
http://www.mail-archive.com/avr-gcc-list#nongnu.org/msg03149.html
The header in mingw64 said the fast types are "Not actually guaranteed to be fastest for all purposes"
/* 7.18.1.3 Fastest minimum-width integer types
* Not actually guaranteed to be fastest for all purposes <---------------------
* Here we use the exact-width types for 8 and 16-bit ints.
*/
typedef signed char int_fast8_t;
typedef unsigned char uint_fast8_t;
typedef short int_fast16_t;
typedef unsigned short uint_fast16_t;
typedef int int_fast32_t;
typedef unsigned int uint_fast32_t;
__MINGW_EXTENSION typedef long long int_fast64_t;
__MINGW_EXTENSION typedef unsigned long long uint_fast64_t;
and that still applies to ARM or other architectures, because using a narrow type requires zero extension or sign extension in many situations which is less optimal than a native int.
However that'll benefit in large arrays or in case or slow operations (like division). I'm not sure how slow ARM divisions are but on x86 64-bit division is much slower than 32-bit or 8-bit division