I have the following Swift code to make a CGBitmapContext
let imageDirectoryPath:String = "/Users/blah/blah/"
let imageFileName:String = "flower.tif"
let imageNS = NSImage(contentsOfFile: imageDirectoryPath + imageFileName)!
let imageCG = imageNS.CGImageForProposedRect(nil, context: nil, hints: nil)
var rawDataIn:[UInt8] = [UInt8](count: Int(imageNS.size.width) * Int(imageNS.size.height) * 4, repeatedValue: 0xff)
let context = CGBitmapContextCreate(
&rawDataIn,
Int(imageNS.size.width),
Int(imageNS.size.height),
8,
Int(imageNS.size.width * 4),
CGColorSpaceCreateDeviceRGB(),
CGImageAlphaInfo.PremultipliedLast.rawValue)
I get an error from this, and using Xcode's scheme for the project to set the environment variable CGBITMAP_CONTEXT_LOG_ERRORS I get detail about the error:
Nov 10 10:12:16 SwiftConsoleGrabcut[826] :
CGBitmapContextCreate: unsupported parameter combination:
8 integer bits/component;
32 bits/pixel;
RGB color space model; kCGImageAlphaPremultipliedLast;
19789 bytes/row.
Valid parameters for RGB color space model are:
16 bits per pixel, 5 bits per component, kCGImageAlphaNoneSkipFirst
32 bits per pixel, 8 bits per component, kCGImageAlphaNoneSkipFirst
32 bits per pixel, 8 bits per component, kCGImageAlphaNoneSkipLast
32 bits per pixel, 8 bits per component, kCGImageAlphaPremultipliedFirst
32 bits per pixel, 8 bits per component, kCGImageAlphaPremultipliedLast
64 bits per pixel, 16 bits per component, kCGImageAlphaPremultipliedLast
64 bits per pixel, 16 bits per component, kCGImageAlphaNoneSkipLast
128 bits per pixel, 32 bits per component, kCGImageAlphaNoneSkipLast |kCGBitmapFloatComponents
128 bits per pixel, 32 bits per component, kCGImageAlphaPremultipliedLast |kCGBitmapFloatComponents
See Quartz 2D Programming Guide (available online) for more information.
But the parameter combination I use, 32 bits per pixel, 8 bits per component, and kCGImageAlphaPremultipliedLast is one of the supported combinations listed.
Why is it rejected?
This may be a question about size.width. The tif image I am loading into imageNS is 481 pixels wide, but imageNS.size.width is reported as 4947.4285714285716 which may explain the bonkers value given in the error message for bytes per row.
Related
I create simple model with keras to understand the cropping layer
def other_model():
x = keras.Input(shape = (64,64,3))
conv = keras.layers.Conv2D(5, 2)(x)
crop = keras.layers.Cropping2D(cropping = 32)(conv)
model = keras.Model(x,crop)
model.summary()
return model
But I get the following summary
Layer (type) Output Shape Param #
input_12 (InputLayer) (None, 64, 64, 3) 0
conv2d_21 (Conv2D) (None, 63, 63, 5) 65
cropping2d_13 (Cropping2D) (None, 0, 0, 5) 0
Total params: 65
Trainable params: 65
Non-trainable params: 0
Why are the 1st and the 2nd dimensions of Cropping2D equal to zero?
They are supposed to be 32
You can just choose the number of pixels which will be cut off at every side of your image. I would chose it bigger or equal than the half size of the image, so it didn't work
It is a bit unclear in the documentation, but if you give a single integer value (cropping=32) as parameter, it crops off 32 pixels on each side of the image.
If you have an image with 64x64 pixels and cropping=32, the target size therefore will be 0x0 pixels...
If you want to have a target size of 32x32 pixels, you have to give cropping=16
According to https://github.com/pytorch/vision/blob/master/torchvision/models/vgg.py
I don`t understand why VGG models take 512 * 7 * 7 input_size of fully-connected layer.
Last convolution layer is
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(True),
nn.MaxPool2d(kernel_size=2, stride=2, dilation=1)
Codes in above link.
class VGG(nn.Module):
def __init__(self, features, num_classes=1000, init_weights=True):
super(VGG, self).__init__()
self.features = features
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, num_classes),
)
To understand this you have to know how the convolution operator works for CNNs.
nn.Conv2d(512, 512, kernel_size=3, padding=1) means that the input image to that convolution has 512 channels and that the output after the convolution is gonna be also 512 channels. The input image is going to be convolved with a kernel of size 3x3 that moves as a sliding window. Finally, the padding=1 means that before applying the convolution, we symmetrically add zeroes to the edges of the input matrix.
In the example you are saying, you can think that 512 is the depth while 7x7 is the width and height that is obtained by applying several convolutions. Imagine that we have an image with some width and height and we feed it to a convolution, the resulting size will be
owidth = floor(((width + 2*padW - kW) / dW) + 1)
oheight = floor(((height + 2*padH - kH) / dH) + 1)
where height and width are the original sizes, padW and padH are height and width (horizontal and vertical) padding, kW and kH are the kernel sizes and dW and dH are the width and height (horizontal and vertical) pixels that the kernel moves (i.e. if it is dW=1 first the kernel will be at pixel (0,0) and then move to (1,0) )
Usually the first convolution operator in a CNN looks like: nn.Conv2d(3, D, kernel_size=3, padding=1) because the original image has 3 input channels (RGB). Assuming that the input image has a size of 256x256x3 pixels if we apply the operator as defined before, the resulting image has the same width and height as the input image but its depth is now D. Simarly if we define the convolution as c = nn.Conv2d(3, 15, kernel_size=25, padding=0, stride=5) with kernel_size=25, no padding in the input image and with stride=5 (dW=dH=5, which means that the kernel moves 5 pixels each time if we are at (0,0) then it moves to (5,0), until we reach the end of the image on the x-axis then it moves to (0,5) -> (5,5) -> (5,15) until it reaches the end again) the resulting output image will have a size of 47x47xD
The VGG neural net has two sections of layers: the "feature" layer and the "classifier" layer. The input to the feature layer is always an image of size 224 x 224 pixels.
The feature layer has 5 nn.MaxPool2d(kernel_size=2, stride=2) convolutions. See referenced source code line 76: each 'M' character in the configurations sets up one MaxPool2d convolution.
A MaxPool2d convolution with these specific parameters reduces the tensor size in half. So we have 224 --> 112 --> 56 --> 28 --> 14 --> 7 which means that the output of the feature layer is a 512 channels * 7 * 7 tensor. This is the input to the "classifier" layer.
I'm writing a program that's using compressed textures in Metal. I'm having a bit of trouble with the replaceRegion() function of MTLTexture. The parameter bytesPerRow just doesn't seem to make sense. It says that for compressed textures, "bytesPerRow is the number of bytes from the beginning of one row of blocks to the beginning of the next."
Now I'm using ASTC with 4x4 blocks, which means that I have 8 bpp. So then 4*4 is 16, and 8 bits is one byte. So I'm guessing that each block size is 16 bytes. But yet, when I enter 16, I get a failed assertion that requires the minimum value to be 4096. What's going on?
Thank you very much.
bytesPerRow = texelsPerRow / blockFootprint.x * 16
uint32_t bytes_per_row(uint32_t texture_width, uint32_t block_width) {
return (texture_width % block_width ? texture_width + (texture_width % block_width) : texture_width) / block_width * 16;
}
This rounds up the texture width to a multiple of the block width first. E.g. a 1024x1024 texture encoded with a block size of 6x6 corresponds to 2736 bytes per row. 1026 / 6 * 16 == 2736.
I am trying to calculate the compression ratio of a given image. My matlab code is as follows:
temp = imfinfo('flowers.jpg');
comperssion_ratio = (temp.Width * temp.Height * temp.BitDepth) / temp.FileSize;
The imfinfo displays the following:
FileSize: 11569
Format: 'jpg'
FormatVersion: ''
Width: 430
Height: 430
BitDepth: 8
ColorType: 'grayscale'
FormatSignature: ''
NumberOfSamples: 1
CodingMethod: 'Huffman'
CodingProcess: 'Sequential'
Comment: {}
Running the above code gives me a compression ratio of about 120 which is huge and does not seem right. Is there something that I'm doing wrong? I went through a document from MIT and they showed that the Width and Height and BitDepth should be divided by 8 and then divided by the FileSize. Why divide by 8?
The division by factor of 8 is to convert bits to bytes.
According to the Matlab documentation for imfinfo
the FileSize parameter is the size of the compressed file, in bytes.
The compression ratio is defined as:
uncompressed size of image in bytes/compressed size of file in bytes
imfinfo gives you the pixel width, height, and bits per pixel (bit depth). From that you can compute the uncompressed size in bits, and divide by 8 to get bytes.
For the uncompressed image , you have 430*430*8/8 = 184,900 bytes.
The size of the compressed image is 11569 bytes.
So the compression ratio is actually 184,900/11569 or 15.98, not an unreasonable value for JPEG.
I'm generating an image using quartz2d and I want to use it as an opengl texture.
The tricky part is that I want to use as few bits per pixel as possible, so I'm creating cgContext as following:
int bitsPerComponent = 5;
int bytesPerPixel = 2;
int width = 1024;
int height = 1024;
void* imageData = malloc(width * height * bytesPerPixel);
CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
CGImageContext context = CGBitmapContextCreate(imageData, width, height, bitsPerComponent, width * bytesPerPixel, colorSpace, kCGImageAlphaNoneSkipFirst);
//draw things into context, release memory, etc.
As stated in the documentation here, this is the only supported RGB pixel format for CGBitmapContextCreate which uses 16 bits per pixel.
So now I want to upload this imageData which looks like "1 bit skipped - 5 bits red - 5 bits green - 5 bits blue" into an opengl texture. So I should do something like this:
glGenTextures(1, &texture);
glBindTexture(GL_TEXTURE_2D, texture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_SHORT_5_5_5_1, imageData);
That won't work because in this call I've specified pixel format as 5 red - 5 green - 5 blue - 1 alpha. That is wrong, but it appears that there is no format that would match core graphics output.
There are some other options like GL_UNSIGNED_SHORT_1_5_5_5_REV, but those wont work on the iphone.
I need some way to use this imageData as a texture, but I really don't want to swap bytes around manually using memset or such, because that seems terribly inefficient.
You do need to swap bits around to get it into a denser format like RGBA551 or RGB565, since as you note, CGBitmapContext does not support these formats for drawing (for simplicity and efficency's sake).
memset isn't going to do the trick, but there are "fast" conversion routines in Accelerate.framework.
See vImageConvert_ARGB8888toRGB565(…) and vImageConvert_ARGB8888toARGB1555(…), available on iOS 5 and later.
For iOS 7.0, OS X.9 and later:
vImage_CGImageFormat fmt = {
.bitsPerComponent = 5,
.bitsPerPixel = 16,
.colorSpace = NULL, // faster with CGImageGetColorSpace(cgImage) if known to be RGB
.bitmapInfo = kCGImageAlphaNoneSkipFirst | kCGBitmapByteOrder16Little // ARGB1555 little endian
};
vImage_Buffer buf;
vImageBuffer_InitWithCGImage( &buf, &fmt, NULL, cgImage, kvImageNoFlags );
...
free(buf.data);
Data is in buf.data, along with image height, width and rowBytes info. I don't recall what GL's requirements are for whether row padding is allowed. You can control that by preallocating the buf.data and buf.rowBytes fields and passing kvImageDoNotAllocate in the flags.
565_REV is kCGImageAlphaNone | kCGBitmapByteOrder16Little.
5551_REV is kCGImageAlphaNoneSkipLast | kCGBitmapByteOrder16Little