JPEG Specification Questions: Walking through my current understanding to hopefully find what is wrong

JPEG Specification Questions: Walking through my current understanding to hopefully find what is wrong - encoding

I want to make a JPEG where for each of the 3 components (Y, Cb, Cr), you encode a 8x8 block one after another, and then move to the next 8x8 block in the image.
E.X.
A 16x16 image exists.
write header (is there anything special I need to mark? I opened a known jpeg to confirm I was writing quantization tables and Huffman tables right, is there a special thing I need to make to make this format work? Also I DON'T want subsample. I want a 1:1 ratio (from my understanding this means I encode 8x8 pixels into a 8x8 block to process through the steps that I am about to name, correct? How do I mark that in the header? With 0x11?).
Steps:
Grab the first 8x8 (top left) of this image.
For Y: DCTII-\>quant-\>RLE-\>Huffman Encode
then, for Cb: DCTII-\>quant-\>RLE-\>Huffman Encode
then, for Cr: DCTII-\>quant-\>RLE-\>Huffman Encode
repeat for top right -\> bottom left -\> bottom right 8x8 pixel block in image
write end of image tag, done.
In the data stream it should go: DC-Y -> AC-Y -> DC-Cb -> AC-Cb -> DC-Cr -> AC-Cr, and so forth yes? Is there any tag I need to insert between components, between DC/AC changes, or between 8x8 pixel blocks? I assume between components a EOB Huffman code is present (that's what I have currently).
Negative numbers:
What format are they? 2's comp? -3 for example would be 101 in 2's comp (3 bit size), but in JPEG you would call this 2 bit size and only encode the 01 portion not the "sign" or the MSB bit right? 3 would be 011 in 2's comp 3 bit, but by the same logic its just 11 (2 bit size) and encoded without sign (MSB) in JPEG right? Anything I am missing?
DC vals:
3 components mean you keep track of 3 different previous DC vals right? For example Y-DC-prev is initialized to 0. Then the first Y-DC val is let's say 25. 25-0 = 25, we encode 25. We then remember 25 for the Y components next DC (not the Cb or Cr component right? They have their own "memories"?) Then DC-Y is lets say 40. Diff = 40-25 = 15, encode 15. remember 40 (not 15 right?). And so forth?
I followed the example here: WIKI. My code can get the exact values all the way down to RLE, which makes me think my Huffman encoding might have the bug. When I make a 16x16 image that basically repeats the image on Wikipedia in a 2x2 tile (also makes the image not grey scale since I force Cb Cr to have the same value as Y; I know the image should have a funky tint because of this, no worries.). I end up getting a semi-believable value for the top right block, then the rest turn into garbage. This led me to believe its my file organization or Huffman encoding that is going wrong. To do a quick check (this is from the Wikipedia example):
FORMAT: (RUNLENGTH, SIZE)(VALUE)
(0, 2)(-3);
(1, 2)(-3);
(0, 1)(-2);
(0, 2)(-6);
(0, 1)(2);
(0, 1)(-4);
(0, 1)(1);
(0, 2)(-3);
(0, 1)(1);
(0, 1)(1);
(0, 2)(5);
(0, 1)(1);
(0, 1)(2);
(0, 1)(-1);
(0, 1)(1);
(0, 1)(-1);
(0, 1)(2);
(5, 1)(-1);
(0, 1)(-1);
(0, 0);
standard Huffman AC-Y table in the spec: TABLE-PAGE154 says 0/2 is code 01. We know that -3 is 01 in 2's comp. So we append 0101 to the stream and then get to the next entry. 1/2 is 11011 from the table, -3 is still 01. So we append 1101101 to the stream and keep going.... all the way to the end where we see a 0x0 which is just 1010. Then we rinse and repeat for the 2 other components, then we rinse and repeat for the rest of the 8x8 pixel blocks in the image yes? The DC val was -26 which is 00110 (size 5) in 2's comp w/o MSB / sign. size 5 for DC-Y codes to 110 according to the Huffman table in the spec (page 153). This means the bit stream should start:
110_00110_01_01_11011_01_...
Obviously the _ are just for readability, I don't add those to the actual file.
This is the image I am getting so far for this curious: incorrect image. I hard coded the 8x8 blocks to always match the ones from Wikipedia so we should see a tilized form of the image, it should be off color due to the 2 new chroma components (given the same exact values as Y).
I've been working on this for days, any help is much appreciated!!

Related

Strange shading behaviour with normal maps in UE4

I've been having some very strange lighting behaviour in Unreal 4. In short, here's what I mean:
Fig 1, First, without any normal mapping on the bricks.
Fig 2, Now with a normal map applied, generated based on the same black-and-white brick texture.
Fig 3, The base pixel normals of the objects in question.
Fig 4, The generated normals which get applied.
Fig 5, The material node setup which produces the issue, as shown in Fig 2
As you can see, the issue occurs when using the generated HeightToNormalSmooth node. As shown, this is not an issue relating to object normals (see Fig 3) or to a badly exported normal map (as there isn't one in the traditional sense), nor is it an issue with the HeightToNormalSmooth node itself (Fig 4 shows that it generates the correct bump normals).
To be clear, the issue here is the fact that using a normal texture at all (this issue occurs across all my materials) causes the positive Y facing faces of an object to turn completely black (or it seems, to become purely reflections-based, as increasing roughness on the material causes the black faces to become less 'shiny' looking).
This is really strange, I've tested with multiple different skylight setups, sun directions, and yet this always happens (even when lit directly), but only on +Y aligned faces.
If anyone can offer insight that would be greatly appreciated.

You're subtracting what looks like 1 from the input that then goes into multiply by 1, if I'm correct. This will, in most cases, make any image return black. This is because in UE4 and many other programs, colors in an image are determined by decimals of Red Green and Blue. These decimals fall in a range of 0 to 1. This means if I wanted to make red, I could use these values- R = 1 G = 0 B = 0. This matters because if R = 0 G = 0 B = 0, the result is black. When you use a multiply node in your example, what you are doing is having UE4 take each pixel of the image you fed into the node (if it was white, R = 1 G = 1 B = 1) and multiply its R, G, and B values by that number. Since zero multiplied by a number equals zero, all the pixels in the image are being set to have values of R = 0, G = 0, and B = 0. thus, all zeros, and you get black.
I also noticed you then multiplied it by one, which in most cases won't do a whole lot, since you're just multiplying the input by 1. If your input is 0, (black), multiplying it by one won't change it, cause 0 * 1 still equals 0.
To fix your issue, try changing the value you subtract from your input to be something smaller than one, say a decimal, such as 0.6 or 0.5

So I've discovered why this was an issue. Turns out there's a little option in the material settings called 'Tangent Space Normal'. This is on by default ('for convenience'), disabling this appears to completely fix the issue with the generated normal maps produced by HeightToNormalSmooth.

Decoding Keyence LJ-X8000 Bitmap-Height Image

I have a Keyence Line Laser System LJ-X 8000, that I use to scan the surface of different objects.
The Controller saves the height information as a bitmap, with each pixel representing one height value. After a lot of tinkering, I found out, that Keyence is not using the actual colors, rather than using the 24-Bit RGB-triplets as some form of binary storage. However, no combination of these bytes seems to work for me. Are there any common storage methods for 24-bit Integers?

To decode those values, I did a scan covering the whole measurement range of the scanner, including some out of range values in the beginning and the end. If you look at the distribution of the values of each color plane, you can see, that the first and third plane actually only use values up to 8/16 which means only 3/4 Bits. This is also visible in the image itself, as it mainly shows a green color.
I concluded that Keyence uses the full byte of the green color plane, 3 Bits of the first and 4 Bits of the last plane to store the height information. Keyence seems to have chosen some weird 15 Bit Integer Format to store their data.
With a little bit-shifting and knowing that the scanner has a valid range from [-2.2, 2.2], I was able to build the following simple little (Matlab-) script to calculate the height information for each pixel:
HeightValBin = bitshift(scanIm(:,:,2),7, 'uint16') ...
+ bitshift(scanIm(:,:,1),4, 'uint16')...
+ bitshift(scanIm(:,:,3),0, 'uint16');
scanBinValScaled = interp1([0,2^15], [-2.2, 2.2], double(scanBinVal));
Keyence offers a software to convert those .bmp into .csv-files, but without an API to automate the process. As I will have to deal with a lot of these files I needed to automate this process.
The calculated values from the rgb triplets are actually even more precise than the exported csv, as the csv only shows 4 digits after the decimal point.

Why pixels have not the same weight?

I don't understand :
if we considerate the value 00001111 (15) is a byte and a RGB pixel (220,180,155) it's 3 bytes whatever the values of the pixel.
so why when i reduce the values of my pixels (with bitshift operation or whatever) the size of
my image is not = pixel numbers x 3. when i say "pixel numbers" i mean "pixel numbers bigger than fully black".
how the mechanism works ? is it counted in bits and then divided by eight as an average ?
if i have a 3MB picture and i do a bitshift (factor 2 on each 3 RGB channel) i found a 300 KB picture.
Don't tell me 90% of my pixels turned fully black.
Thanks.

If you shift all the pixel values right by 2 places, you will have around 1/4 as many shades of red as before, and around 1/4 as many greens and likewise for blues. That means overall you will have vastly fewer colours. That means your image may well have fewer than 256 colours which means it can be palettised. It also means it is likely to compress better because there will be more repetition of fewer unique sequences.
You can check if your image is palettised in several ways:
open it with PIL and check if image.mode contains a P
run exiftool on it and check if Colour Type is Palette
run ImageMagick on it with magick identify -verbose YOURIMAGE
You can count the number of unique colours in your image with ImageMagick using:
magick identify -format %k YOURIMAGE
Or you can do it in Python with the last part (entitled "Update") of this answer.

Trying to understand how 1-bit BMP image is drawn

As can be seen in this example, each channel (R, G, B) in a BMP file takes an input. A 24-bit BMP image has 8 bit for-R , 8-bit for G, and 8 bit for B. I saved an image in MS-paint as monochrome(black and white). Its property says the image's depth is 1-bit. The question is who gets this 1 bit: R , G or B? Is it not mandatory that all the three channels must get certain value? I am not able to understand how MS-Paint has drawn this BMP image using 1 bit.
Thanks in advance for your replies.

There's multiple ways to store a bitmap. In this case, the important distinction is RGB versus indexed.
In an RGB bitmap, every pixel is associated with three separate values, one for red, another for green, and another for blue. Depending on the "bitness" (bit depth) and the specific pixel format, the different colour channels can have different amount of bits allocated for them - the simplest case is the typical true-color with 8 bits for each of the channels, and another 8 bits (optional) for the alpha channel. However, some pixel formats allocate a bit differently - the idea is that the human eye has different sensitivity to each of those channels, and you can save up on space and improve visual quality by allocating the bits in a smarter way. For example, one of the more popular pixel formats is BGR-565 - that is, 16 bits total, 5 bits for blue, 6 bits for green and 5 bits for red.
In an indexed bitmap, the value stored with each of the pixels is an index (hence "indexed bitmap") into a palette (a colour table). The palette is usually a simple table of colours, using RGB "pixel" formats to assign each index with some specific colour. For example, index 0 might mean black, 1 might mean turqouise etc.
In this case, the bit-depth doesn't exactly map into colour quality - you're not trying to map the whole colour space, you're focusing on some subset of the possible colours instead. For example, if you have 256 shades of grey (say, from black to white), a true-colour bitmap would need at least three bytes per pixel (and each of those three bytes would have the same value), while you could use an indexed bitmap with a pallete of all the grey colours, requiring only one byte per pixel (plus the cost of the pallete - 256 * 3 bytes). There's a lot of benefits to using indexed bitmaps, and a lot of tricks to improve the visual quality further without using more bits-per-pixel, but that would be way beyond the scope of this question.
This also means that you only need as many possible values as you want to show. If you only need 16 different colours, you only need four bits per pixel. If you only need a monochromatic bitmap (that is, either a pixel is "on", or it's "off"), you only need one bit per pixel - and that's exactly your case. If you have the amount of distinct colours you need, you can easily get the required bit depth by taking a base-2 logarithm (e.g. log 256 = 8).
So let's say you have an image that only uses two colours - black and white. You'll build a pallete with two colours, black and white. And for each of the pixels in the bitmap, you either save 0 if it's black, or 1 if it's white.
Now, when you want to draw a bitmap like this, you simply read the palette (0 -> RGB(0, 0, 0), 1 -> RGB(1, 1, 1) in this case), and then you read one pixel after another. If the bit is zero, paint a black pixel. If it's one, paint a white pixel. Done :)

No, it depends on the type of data you chose to save as. Because you chose to save as monochrome, the RGB mapping is not used here, and the used mapping would go as a one byte per pixel, ranging from white to black.
Each type has its own mapping ways, saving as 24-bit will give you RGB mapping, saving as 256 will map a byte for each pixel, each value represents a color( you can find the table on the internet), as for monochrome, you'll have the same as a 256 bitmap, but the color table will only have white and black colors.
Sorry for the mistake, the way I explained for monochrome is actually used by Gray Scale, the monochrome uses one bit to indicate if the pixel is black or white, depending on the value of each bit, no mapping table is used.

Get length of irregular object in a BW or RGB picture and draw it into picture for control

I face a well known problem which I am not able to solve.
I have the picture of a root (http://cl.ly/image/2W3C0a3X0a3Y). From this picture, I would like to know the length of the longest root (1st problem), the portion of the big roots and the small roots in % (say the diameter as an orientation which is the second problem). It is important that I can distinguish between fine and big roots since this is more or less the aim of the study (portion of them compared between different species). The last thing, I would like to draw a line along the measured longest root to check if everything was measured right.
For the length of the longest root, I tried to use regionprops(), which is not optimal since this assumes an oval as basic shape if I got this right.
However, the things I could really need support with are in fact:
How can I get the length of the longest root (start point should be the place where the longest root leaves the main root with the biggest diameter)?
Is it possible to distinguish between fine and big roots and can I get the portion of them? (the coin, the round object in the image is the reference)
Can I draw properties like length and diameter into the picture?
I found out how to draw the centriods of ovals and stuff, but I just dont understand how to do it with the proposed values.
I hope this is no double post and this question does not exists like this somewhere else, if yes, I am sorry for that.
I would like to thank the people on this forum, you do a great job and everybody with a question can be lucky to have you here.
Thank you for the help,
Phillip
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
EDIT
I followed the proposed solution, the code until now is as followed:
clc
clear all
close all
img=imread('root_test.jpg');
labTransformation = makecform('srgb2lab');
labI = applycform(img,labTransformation);
%seperate l,a,b
l = labI(:,:,1);
a = labI(:,:,2);
b = labI(:,:,3);
level = graythresh(l);
bw = im2bw(l);
bw = ~bw;
bw = bwareaopen(bw, 200);
se = strel('disk', 5);
bw2=imdilate(bw, se);
bw2 = imfill(bw2, 'holes');
bw3 =bwmorph(bw2, 'thin', 5);
bw3=double(bw3);
I4 = bwmorph(bw3, 'skel', 200);
%se = strel('disk', 10);%this step is for better visibility of the line
%bw4=imdilate(I4, se);
D = bwdist(I4);
This leads my in the skeleton picture - which is a great progress, thank you for that!!!
I am a little bit out at the point where I have to calculate the distances. How can I explain MatLab that it has to calculate the distance from all the small roots to the main root (how to define this?)? For this I have to work with the diameters first, right?
Could you maybe give the one or the other hint more how to accomplish the distance/length problem?
Thank you for the great help till here!
Phillip
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
EDIT2
Ok, I managed to separate the single root parts. This is not what your edit proposed, but at least something. I have the summed length of all roots as well - not too bad. But even with the (I assume) super easy step by step explanation I have never seen such a tree. I stopped at the point at which I have to select an invisible point - the rest is too advanced for me.
I dont want to waste more of the time and I am very thankful for the help you gave me already. But I suppose I am too MatLab-stupid to accomplish this :)
Thanks! Keep going like this, it is really helpful.
Phillip

For a pre-starting point, I don't see the need for a resolution of 3439x2439 for that image, it doesn't seem to add anything important to the problem, so I simply worked with a resized version of 800x567 (although there should be (nearly) no problem to apply this answer to the larger version). Also, you mention regionprops but I didn't see any description of how you got your binary image, so let us start from the beginning.
I considered your image in the LAB colorspace, then binarized the L channel by Otsu, applied a dilation on this result considering the foreground as black (the same could be done by applying an erosion instead), and finally removed small components. The L channel gives a better representation of your image than the more direct luma formula, leading to an easier segmentation. The dilation (or erosion) is done to join minor features, since there are quite a bit of ramifications that appear to be irrelevant. This produced the following image:
At this point we could attempt using the distance transform combined with grey tone anchored skeleton (see Soille's book on morphology, and/or "Order Independent Homotopic Thinning for Binary and Grey Tone Anchored Skeletons" by Ranwez and Soille). But, since the later is not easily available I will consider something simpler here. If we perform hole filling in the image above followed by thinning and pruning, we get a rough sketch of the connections between the many roots. The following image shows the result of this step composed with the original image (and dilated for better visualization):
As expected, the thinned image takes "shortcuts" due to the hole filling. But, if such step wasn't performed, then we would end up with cycles in this image -- something I want to avoid here. Nevertheless, it seems to provide a decent approximation to the size of the actual roots.
Now we need to calculate the sizes of the branches (or roots). The first thing is deciding where the main root is. This can be done by using the above binary image before the dilation and considering the distance transform, but this will not be done here -- my interest is only showing the feasibility of calculating those lengths. Supposing you know where your main root is, we need to find a path from a given root to it, and then the size of this path is the size of this root. Observe that if we eliminate the branch points from the thinned image, we get a nice set of connected components:
Assuming each end point is the end of a root, then the size of a root is the shortest path to the main root, and the path is composed by a set of connected components in the just shown image. Now you can find the largest one, the second largest, and all the others that can be calculated by this process.
EDIT:
In order to make the last step clear, first let us label all the branches found (open the image in a new tab for better visualization):
Now, the "digital" length of each branch is simply the amount of pixels in the component. You can later translate this value to a "real-world" length by considering the object added to the image. Note that at this point there is no need to depend on Image Processing algorithms at all, we can construct a tree from this representation and work there. The tree is built in the following manner: 1) find the branching point in the skeleton that belongs to the main root (this is the "invisible point" between the labels 15, 16, and 17 in the above image); 2) create an edge from that point to each branch connected to it; 3) assign a weight to the edge according to the amount of pixels needed to travel till the start of the other branch; 4) repeat with the new starting branches. For instance, at the initial point, it takes 0 pixels to reach the beginning of the branches 15, 16, and 17. Then, to reach from the beginning of the branch 15 till its end, it takes the size (number of pixels) of the branch 15. At this point we have nothing else to visit in this path, so we create a leaf node. The same process is repeated for all the other branches. For instance, here is the complete tree for this labeling (the dual representation of the following tree is much more space-efficient):
Now you find the largest weighted path -- which corresponds to the size of the largest root -- and so on.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse