ASTC Texture compression in metal– what should I use as the bytes per row? - swift

I'm writing a program that's using compressed textures in Metal. I'm having a bit of trouble with the replaceRegion() function of MTLTexture. The parameter bytesPerRow just doesn't seem to make sense. It says that for compressed textures, "bytesPerRow is the number of bytes from the beginning of one row of blocks to the beginning of the next."
Now I'm using ASTC with 4x4 blocks, which means that I have 8 bpp. So then 4*4 is 16, and 8 bits is one byte. So I'm guessing that each block size is 16 bytes. But yet, when I enter 16, I get a failed assertion that requires the minimum value to be 4096. What's going on?
Thank you very much.

bytesPerRow = texelsPerRow / blockFootprint.x * 16

uint32_t bytes_per_row(uint32_t texture_width, uint32_t block_width) {
return (texture_width % block_width ? texture_width + (texture_width % block_width) : texture_width) / block_width * 16;
}
This rounds up the texture width to a multiple of the block width first. E.g. a 1024x1024 texture encoded with a block size of 6x6 corresponds to 2736 bytes per row. 1026 / 6 * 16 == 2736.

Related

How is flutter sound stream representing pcm data?

I need to make a app that visualizes audio data by graphing it, and I've tried fluter sound and sound stream to get the raw audio data. However both of these libraries captures the sound as 16 bit pcm, but returns a stream of Uint8list. So I don't understand how they are representing the 16 bit pcm with 8 bit integers.
I've tried to just graph out the numbers as is, but it doesnt appear to be right. The following is the code I used to graph out 30hz sine wave, with the data sound_stream provides.
final Uint8List data;
#override
void paint(Canvas canvas, Size size) {
double dx = size.width / data.length;
for(int i = 1; i < data.length; i+=1){
canvas.drawLine(Offset((i - 1) * dx, (data[i-1].toDouble()/256) * size.height), Offset((i) * dx, (data[i].toDouble()/256) * size.height), Paint()..color = Colors.red..strokeWidth=1);
}
}
It's a Uint8List because that's a byte array and is what gets moved across the native-Dart boundary. So you need to view that byte array as a list of 16 bit integers. Use ByteBuffer to do this.
final Uint8List data;
final pcm16 = data.buffer.asInt16List();
// pcm16 is an Int16List and will be half the length of data
// expect values in pcm16 to be between -32768 and +32767 so
// normalize by dividing by 32768.0 to get -1.0 to +1.0
You will probably find that you are getting twice as many bytes as you expect. If you are sampling at 16kHz, expect 32k bytes a second, but you'll get 16000 samples a second when viewed as 16 bit ints.

Why VGG-16 takes input size 512 * 7 * 7?

According to https://github.com/pytorch/vision/blob/master/torchvision/models/vgg.py
I don`t understand why VGG models take 512 * 7 * 7 input_size of fully-connected layer.
Last convolution layer is
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(True),
nn.MaxPool2d(kernel_size=2, stride=2, dilation=1)
Codes in above link.
class VGG(nn.Module):
def __init__(self, features, num_classes=1000, init_weights=True):
super(VGG, self).__init__()
self.features = features
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, num_classes),
)
To understand this you have to know how the convolution operator works for CNNs.
nn.Conv2d(512, 512, kernel_size=3, padding=1) means that the input image to that convolution has 512 channels and that the output after the convolution is gonna be also 512 channels. The input image is going to be convolved with a kernel of size 3x3 that moves as a sliding window. Finally, the padding=1 means that before applying the convolution, we symmetrically add zeroes to the edges of the input matrix.
In the example you are saying, you can think that 512 is the depth while 7x7 is the width and height that is obtained by applying several convolutions. Imagine that we have an image with some width and height and we feed it to a convolution, the resulting size will be
owidth = floor(((width + 2*padW - kW) / dW) + 1)
oheight = floor(((height + 2*padH - kH) / dH) + 1)
where height and width are the original sizes, padW and padH are height and width (horizontal and vertical) padding, kW and kH are the kernel sizes and dW and dH are the width and height (horizontal and vertical) pixels that the kernel moves (i.e. if it is dW=1 first the kernel will be at pixel (0,0) and then move to (1,0) )
Usually the first convolution operator in a CNN looks like: nn.Conv2d(3, D, kernel_size=3, padding=1) because the original image has 3 input channels (RGB). Assuming that the input image has a size of 256x256x3 pixels if we apply the operator as defined before, the resulting image has the same width and height as the input image but its depth is now D. Simarly if we define the convolution as c = nn.Conv2d(3, 15, kernel_size=25, padding=0, stride=5) with kernel_size=25, no padding in the input image and with stride=5 (dW=dH=5, which means that the kernel moves 5 pixels each time if we are at (0,0) then it moves to (5,0), until we reach the end of the image on the x-axis then it moves to (0,5) -> (5,5) -> (5,15) until it reaches the end again) the resulting output image will have a size of 47x47xD
The VGG neural net has two sections of layers: the "feature" layer and the "classifier" layer. The input to the feature layer is always an image of size 224 x 224 pixels.
The feature layer has 5 nn.MaxPool2d(kernel_size=2, stride=2) convolutions. See referenced source code line 76: each 'M' character in the configurations sets up one MaxPool2d convolution.
A MaxPool2d convolution with these specific parameters reduces the tensor size in half. So we have 224 --> 112 --> 56 --> 28 --> 14 --> 7 which means that the output of the feature layer is a 512 channels * 7 * 7 tensor. This is the input to the "classifier" layer.

Why is this supported CGBitmapContextCreate parameter combination rejected as unsupported?

I have the following Swift code to make a CGBitmapContext
let imageDirectoryPath:String = "/Users/blah/blah/"
let imageFileName:String = "flower.tif"
let imageNS = NSImage(contentsOfFile: imageDirectoryPath + imageFileName)!
let imageCG = imageNS.CGImageForProposedRect(nil, context: nil, hints: nil)
var rawDataIn:[UInt8] = [UInt8](count: Int(imageNS.size.width) * Int(imageNS.size.height) * 4, repeatedValue: 0xff)
let context = CGBitmapContextCreate(
&rawDataIn,
Int(imageNS.size.width),
Int(imageNS.size.height),
8,
Int(imageNS.size.width * 4),
CGColorSpaceCreateDeviceRGB(),
CGImageAlphaInfo.PremultipliedLast.rawValue)
I get an error from this, and using Xcode's scheme for the project to set the environment variable CGBITMAP_CONTEXT_LOG_ERRORS I get detail about the error:
Nov 10 10:12:16 SwiftConsoleGrabcut[826] :
CGBitmapContextCreate: unsupported parameter combination:
8 integer bits/component;
32 bits/pixel;
RGB color space model; kCGImageAlphaPremultipliedLast;
19789 bytes/row.
Valid parameters for RGB color space model are:
16 bits per pixel, 5 bits per component, kCGImageAlphaNoneSkipFirst
32 bits per pixel, 8 bits per component, kCGImageAlphaNoneSkipFirst
32 bits per pixel, 8 bits per component, kCGImageAlphaNoneSkipLast
32 bits per pixel, 8 bits per component, kCGImageAlphaPremultipliedFirst
32 bits per pixel, 8 bits per component, kCGImageAlphaPremultipliedLast
64 bits per pixel, 16 bits per component, kCGImageAlphaPremultipliedLast
64 bits per pixel, 16 bits per component, kCGImageAlphaNoneSkipLast
128 bits per pixel, 32 bits per component, kCGImageAlphaNoneSkipLast |kCGBitmapFloatComponents
128 bits per pixel, 32 bits per component, kCGImageAlphaPremultipliedLast |kCGBitmapFloatComponents
See Quartz 2D Programming Guide (available online) for more information.
But the parameter combination I use, 32 bits per pixel, 8 bits per component, and kCGImageAlphaPremultipliedLast is one of the supported combinations listed.
Why is it rejected?
This may be a question about size.width. The tif image I am loading into imageNS is 481 pixels wide, but imageNS.size.width is reported as 4947.4285714285716 which may explain the bonkers value given in the error message for bytes per row.

Image Compression using imfinfo function in Matlab

I am trying to calculate the compression ratio of a given image. My matlab code is as follows:
temp = imfinfo('flowers.jpg');
comperssion_ratio = (temp.Width * temp.Height * temp.BitDepth) / temp.FileSize;
The imfinfo displays the following:
FileSize: 11569
Format: 'jpg'
FormatVersion: ''
Width: 430
Height: 430
BitDepth: 8
ColorType: 'grayscale'
FormatSignature: ''
NumberOfSamples: 1
CodingMethod: 'Huffman'
CodingProcess: 'Sequential'
Comment: {}
Running the above code gives me a compression ratio of about 120 which is huge and does not seem right. Is there something that I'm doing wrong? I went through a document from MIT and they showed that the Width and Height and BitDepth should be divided by 8 and then divided by the FileSize. Why divide by 8?
The division by factor of 8 is to convert bits to bytes.
According to the Matlab documentation for imfinfo
the FileSize parameter is the size of the compressed file, in bytes.
The compression ratio is defined as:
uncompressed size of image in bytes/compressed size of file in bytes
imfinfo gives you the pixel width, height, and bits per pixel (bit depth). From that you can compute the uncompressed size in bits, and divide by 8 to get bytes.
For the uncompressed image , you have 430*430*8/8 = 184,900 bytes.
The size of the compressed image is 11569 bytes.
So the compression ratio is actually 184,900/11569 or 15.98, not an unreasonable value for JPEG.

How to get the real RGBA or ARGB color values without premultiplied alpha?

I'm creating an bitmap context using CGBitmapContextCreate with the kCGImageAlphaPremultipliedFirst option.
I made a 5 x 5 test image with some major colors (pure red, green, blue, white, black), some mixed colors (i.e. purple) combined with some alpha variations. Every time when the alpha component is not 255, the color value is wrong.
I found that I could re-calculate the color when I do something like:
almostCorrectRed = wrongRed * (255 / alphaValue);
almostCorrectGreen = wrongGreen * (255 / alphaValue);
almostCorrectBlue = wrongBlue * (255 / alphaValue);
But the problem is, that my calculations are sometimes off by 3 or even more. So for example I get a value of 242 instead of 245 for green, and I am 100% sure that it must be exactly 245. Alpha is 128.
Then, for the exact same color just with different alpha opacity in the PNG bitmap, I get alpha = 255 and green = 245 as it should be.
If alpha is 0, then red, green and blue are also 0. Here all data is lost and I can't figure out the color of the pixel.
How can I avoid or undo this alpha premultiplication alltogether so that I can modify pixels in my image based on the true R G B pixel values as they were when the image was created in Photoshop? How can I recover the original values for R, G, B and A?
Background info (probably not necessary for this question):
What I'm doing is this: I take a UIImage, draw it to a bitmap context in order to perform some simple image manipulation algorithms on it, shifting the color of each pixel depending on what color it was before. Nothing really special. But my code needs the real colors. When a pixel is transparent (meaning it has alpha less than 255) my algorithm shouldn't care about this, it should just modify R,G,B as needed while Alpha remains at whatever it is. Sometimes though it will shift alpha up or down too. But I see them as two separate things. Alpha contorls transparency, while R G B control the color.
This is a fundamental problem with premultiplication in an integral type:
245 * (128/255) = 122.98
122.98 truncated to an integer = 122
122 * (255/128) = 243.046875
I'm not sure why you're getting 242 instead of 243, but this problem remains either way, and it gets worse the lower the alpha goes.
The solution is to use floating-point components instead. The Quartz 2D Programming Guide gives the full details of the format you'll need to use.
Important point: You'd need to use floating-point from the creation of the original image (and I don't think it's even possible to save such an image as PNG; you might have to use TIFF). An image that was already premultiplied in an integral type has already lost that precision; there is no getting it back.
The zero-alpha case is the extreme version of this, to such an extent that even floating-point cannot help you. Anything times zero (alpha) is zero, and there is no recovering the original unpremultiplied value from that point.
Pre-multiplying alpha with an integer color type is an information lossy operation. Data is destroyed during the quantization process (rounding to 8 bits).
Since some data is destroy (by rounding), there is no way to recover the exact original pixel color (except for some lucky values). You have to save the colors of your photoshop image before you draw it into a bitmap context, and use that original color data, not the multiplied color data from the bitmap.
I ran into this same problem when trying to read image data, render it to another image with CoreGraphics, and then save the result as non-premultiplied data. The solution I found that worked for me was to save a table that contains the exact mapping that CoreGraphics uses to map non-premultiplied data to premultiplied data. Then, estimate what the original premultipled value would be with a mult and floor() call. Then, if the estimate and the result from the table lookup do not match, just check the value below the estimate and the one above the estimate in the table for the exact match.
// Execute premultiply logic on RGBA components split into componenets.
// For example, a pixel RGB (128, 0, 0) with A = 128
// would return (255, 0, 0) with A = 128
static
inline
uint32_t premultiply_bgra_inline(uint32_t red, uint32_t green, uint32_t blue, uint32_t alpha)
{
const uint8_t* const restrict alphaTable = &extern_alphaTablesPtr[alpha * PREMULT_TABLEMAX];
uint32_t result = (alpha << 24) | (alphaTable[red] << 16) | (alphaTable[green] << 8) | alphaTable[blue];
return result;
}
static inline
int unpremultiply(const uint32_t premultRGBComponent, const float alphaMult, const uint32_t alpha)
{
float multVal = premultRGBComponent * alphaMult;
float floorVal = floor(multVal);
uint32_t unpremultRGBComponent = (uint32_t)floorVal;
assert(unpremultRGBComponent >= 0);
if (unpremultRGBComponent > 255) {
unpremultRGBComponent = 255;
}
// Pass the unpremultiplied estimated value through the
// premultiply table again to verify that the result
// maps back to the same rgb component value that was
// passed in. It is possible that the result of the
// multiplication is smaller or larger than the
// original value, so this will either add or remove
// one int value to the result rgb component to account
// for the error possibility.
uint32_t premultPixel = premultiply_bgra_inline(unpremultRGBComponent, 0, 0, alpha);
uint32_t premultActualRGBComponent = (premultPixel >> 16) & 0xFF;
if (premultRGBComponent != premultActualRGBComponent) {
if ((premultActualRGBComponent < premultRGBComponent) && (unpremultRGBComponent < 255)) {
unpremultRGBComponent += 1;
} else if ((premultActualRGBComponent > premultRGBComponent) && (unpremultRGBComponent > 0)) {
unpremultRGBComponent -= 1;
} else {
// This should never happen
assert(0);
}
}
return unpremultRGBComponent;
}
You can find the complete static table of values at this github link.
Note that this approach will not recover information "lost" when the original unpremultiplied pixel was premultiplied. But, it does return the smallest unpremultiplied pixel that will become the premultiplied pixel once run through the premultiply logic again. This is useful when the graphics subsystem only accepts premultiplied pixels (like CoreGraphics on OSX). If the graphics subsystem only accepts premultipled pixels, then you are better off storing only the premultipled pixels, since less space is consumed as compared to the unpremultiplied pixels.