(How) is it possible to do a bitwise copy of a vec4(32bit) into a float(32bit) bitwise ?
Already tried:
Vec4ToFloat and Back
Vec4ToFloat and Back
A vec4 is usually 128 bits (4x32-bit float), so packing it into a single float is problematic.
Perhaps you mean packing an arbitrary 32-bit value into a float? You can use the builtin intBitsToFloat to copy 32 bits from an int to a float in a bitwise manner. However, the resulting float value might be a Nan of some type, in which case converting it back to an int with floatBitsToInt might not give you the same result.
edit
From your comment, you seem to confusing pixel formats in buffers with numeric formats in the shader. Whenever you read a pixel from a buffer or texture, it will be converted from the pixel format to a numeric format, depending on how you read the pixel. If you've done a normal texture lookup on a sampler of some kind bound to a RGBA texture, then each channel of the texure will have been unpacked into a 32-bit float in the range 0.0..1.0. You can easily pack it back into a 32-bit uint with packUnorm4x8, and then convert that 32-bit uint to a float with intBitsToFloat (which might have the above noted problem with Nan on the second step). Or you can just read it as a 32-bit int or float in the first place by using the appropriate texel type when you set up the sampler (which will affect interpolation, but you probably don't want interpolation at all, depending on what you are trying to do).
Related
Learning about the difference between Floats and Doubles in Swift. I can't think of any reasons to use Float. I know there are, and I know I am just not experienced enough to understand them.
So my question is why would you use float in Swift?
why would you use float in Swift
Left to your own devices, you likely never would. But there are situations where you have to. For example, the value of a UISlider is a Float. So when you retrieve that number, you are working with a Float. It’s not up to you.
And so with all the other numerical types. Swift includes a numerical type corresponding to every numerical type that you might possibly encounter as you interface with Cocoa and the outside world.
Float is a typealias for Float32. Float32 and Float16 are incredibly useful for GPU programming with Metal. They both will feel as archaic someday on the GPU as they do on the CPU, but that day is years off.
https://developer.apple.com/metal/
Double
Represents a 64-bit floating-point number.
Has a precision of at least 15 decimal digits.
Float
Float represents a 32-bit floating-point number.
precision of Float can be as little as 6 decimal digits.
The appropriate floating-point type to use depends on the nature and range of values you need to work with in your code. In situations where either type would be appropriate, Double is preferred.
I'm reading an IMU on the arduino board with a s-function block in simulink by double or single data types though I just need 2 decimals precision as ("xyz.ab").I want to improve the performance with changing data types and wonder that;
is there a way to decrease the precision to 2 decimals in s-function block or by adding/using any other conversion blocks/codes in the simulink aside from using fixed-point tool?
For true fixed point transfer, fixed-point toolbox is the most general answer, as stated in Phil's comment.
However, to avoid toolbox use, you could also devise your own fix-point integer format and add a block that takes a floating point input and convert it into an integer format (and vice versa on the output).
E.g. If you know the range is 327.68 < var < 327.67 you could just define your float as an int16 divided by 10. In a matlab function block you would then just say
y=int16(u*100.0);
to convert the input to the S-function.
On the output it would be a reversal
y=double(u)/100.0;
(Eml/matlab function code can be avoided by using multiply, divide and convert blocks.)
However, be mindful of the bits available and that the scaling (*,/) operations is done on the floating point rather than the integer.
2^(nrOfBits-1)-1 shows you what range you can represent including signeage. For unsigned types uint8/16/32 the range is 2^(nrOfBits)-1. Then you use the scaling to fit the representable bit into your used floating point range. The scaled range divided by 2^nrOfBits will tell you what the resolution will be (how large are the steps).
You will need to scale the variables correspondingly on the Arduino side as well when you go to an integer interface of this type. (I'm assuming you have access to that code - if not it'd be hard to use any other interface than what is already provided)
Note that the intXX(doubleVar*scale) will always truncate the values to integer. If you need proper rounding you should also include the round function, e.g.:
int16(round(doubleVar*scale));
You don't need to use a base 10 scale, any scaling and offsets can be used, but it's easier to make out numbers manually if you keep to base 10 (i.e. 0.1 10.0 100.0 1000.0 etc.).
As a final note, if the Arduino code interface is floating point (single/double) and can't be changed to integer type; you will not get any speedup from rounding decimals since the full floating point is what will be is transferred anyway. Even if you do manage to reduce the data a bit using integers I suspect this might not give a huge speedup unless you transfer large amounts of data. The interface code will have a comparatively large overhead anyway.
Good luck with your project!
I would like to transfer single-precision float values from Matlab/Simulink into my Python script. Data in Matlab/Simulink are saved into V7 MAT file (so that I can use scipy.io.loadmat()). Those float values are found in a Python dictionary after loadmat().
My question is whether or not the float value I get in Python is identical to the one originally in Matlab/Simulink down to the last bit.
For example, a float value, 0.32 is ‘00111110101000111101011100001010’.
When this "0.32" is read into Python, can I reproduce exact bit representation, nothing more, nothing less?
Both MATLAB and Python use the IEEE 754 single precision format to represent 32-bit floating point numbers - you can guarantee that both will represent the same float value using the exact same representation.
I have generated binary string from a float value from MATLAB/Simulink environment and from Python environment (after scipy.io.loadmat()). They are identical to the last bit. Therefore, you can use scipy.io.loadmat() to transfer data to Python environment bit accurately.
Does MATLAB support float16 operations? If so, how to convert a double matrix to float16? I am doing an arithmetic operation on a large matrix where 16-bit floating representation is sufficient for my representation. Representing by a double datatype takes 4 times more memory.
Is your matrix full? Otherwise, try sparse -- saves a lot of memory if there's lots of zero-valued elements.
AFAIK, float16 is not supported. Lowest you can go in float-datatype is with single, which is a 32-bit datatype:
A = single( rand(50) );
You could multiply by a constant and cast to int16, but you'd lose precision.
The numeric classes, which Matlab supports out of the box are the following:
int8
int16
int32
int64
uint8
uint16
uint32
uint64
single (32-bit float)
double (64-bit float)
plus the complex data type. So no 16-bit floats, unfortunately.
On Mathworks file exchange, there seems to be a half-precision float library. It requires MEX, however.
This might be an old question, but I found it in search for a similiar problem (half precision in matlab).
Things seemed to have changed in time:
https://www.mathworks.com/help/fixedpoint/ref/half.html
Half-precision seems to be supported nativeley by matlab now.
When a float is casted to int, how this casting is implemented by compiler.
Does compiler masks some part of memory of float variable i.e., which part of memory is plunked by compiler to pass the remaining to int variable.
I guess the answer to this lies in how the int and float is maintained in memory.
But isn't it machine dependent rather than compiler dependent. How compiler decides which part of memory to copy when casted to lower type (this is a static casting, right).
I am kind of confused with some wrong information, I guess.
(I read some questions on tag=downcasting, where debate on whether it is a cast or a conversion was going on, I am not very much interested on what it is called, as both are performed by compiler, but on how this is being performed).
...
Thanks
When talking about basic types and not pointers, then a conversion is done. Because floating point and integer representations are very different (usually IEEE-754 and two's complement respectively) it's more than just masking out some bits.
If you wanted to see the floating point number represented as an int without doing a conversion, you can do something like this (in C):
float f = 10.5;
int i2 = (int*)&f;
printf("%f %d\n", f, i2);
Most CPU architectures provide a native instruction (or multi-instruction sequence) to do float<->int conversions. The compiler will generally just generate this instruction. There's often faster methods. This question has some good information: What is the fastest way to convert float to int on x86.