JPEG SOF0 subsampling components - encoding

with JPEG snoop for an image 4:2:2 hor (YYCbCr)
I see this in the SOF0:
Component[1]: ID=0x01, Samp Fac=0x21 (Subsamp 1 x 1), Quant Tbl Sel=0x00 (Lum: Y)
Component[2]: ID=0x02, Samp Fac=0x11 (Subsamp 2 x 1), Quant Tbl Sel=0x01 (Chrom: Cb)
Component[3]: ID=0x03, Samp Fac=0x11 (Subsamp 2 x 1), Quant Tbl Sel=0x01 (Chrom: Cr)
Now where are the values 0x21 and 0x11 coming from?
I know that sampling factors are stored like this: (1byte) (bit 0-3 vertical., 4-7 horizontal.)
but I don't see how 0x11 relates to 2x1 and 0x21 to 1x1.
I expected to see 0x11 for Y component and not 0x21.
(not sure how you get 0x21 as result).
Can somebody explain these values and how you calculate them for example 4:2:2 horizontal (16x8)?

JPEG does it bassackwarks. The values indicate RELATIVE SAMPLING RATES.
The highest sampling rate is for Y (2). The sampling rate for Cb and Cr is 1.
Use the highest sampling rate to normalize to pixels:
2Y = Cb = Cr.
Y = 1/2 Cb = 1/2 Cr.
For every Y pixel value in that direction you use 1/2 a Cb and Cr pixel value.
You could even have something like according to the JPEG standard.
4Y = 3Cb = 1Cr
Y = 3/4Cb = 1/4 Cr
or
3Y=2Cb=1Cr
Y=2/3Cb=1/3Cr
But most decoders could not handle that.
The labels like "4:4:4", "4:2:2", and "4:4:0" are just that: labels that are not in the JPEG standard. Quite frankly, I don't even know where those term even come from and they are not intuitive at all (there is never a zero sampling).
Let me add another way of looking at this problem. But first, you have to keep in mind that the JPEG standard itself is not implementable. Things necessary to encode images are undefined and the standard is sprawling with unnecessary stuff.
If a scan is interleaved (all three components), it is encoded in minimum coded units (MCUs). An MCU consists of 8x8 encoded blocks.
The sampling rate specifies the number of 8x8 blocks in an MCU.
You have 2x1 for Y + 1x1 for Cb and 1x1 for Cr. That means a total of 4 8x8 blocks are in an MCU. While I mentioned other theoretical values above, the maximum number of blocks in an MCU is 10. Thus 4x4 + 3x3 + 2x2 is not possible.
The JPEG standard does not say how those blocks are mapped to pixels in an image. We usually use the largest value and say that wave a 2x1 zone or 16x8 pixels.
But all kinds of weirdness is possible under the standard, such as:
Y = 2x1, Cb = 1x2 and Cr = 1x1
That would probably mean an MCU maps to a 16x16 block of pixels but your decoder would probably not support this. Alternatively, it might mean an MCA maps to a 16x8 block of pixels and the Cb component has more values in the 8 direction.
A final way of viewing this (the practicable way) is to use the Y component as a reference point. Assume that Y is always going to have 1 or 2 (and maybe a 4) as the sampling rate in the X and Y directions and define the rates on Cb and Cr are going to be 1 (and maybe 2).The Y component always defines the pixels in the image.
These would then be realistic possibilities:
Y Cb Cr
1x1, 1x1, 1x1
2x2, 1x1, 1x1
4x4, 1x1, 1x1
2x1, 1x1, 1x1
1x2, 1x1, 1x1

Related

Interpolation on 4D data

I am trying to perform an interpolation/fit (preferably non-linear, but linear should also be fine) on 4D data. My data has a form of:
[a,b,c] = func(input)
obviously, func is unknown and ultimately data looks like (input, a, b, c):
0 -0.1253 0.0341 0.01060
35 -0.0985 0.0176 0.02060
50 -0.0315 -0.0533 0.1118
60 -0.0518 -0.0327 0.03020
80 0.2939 -0.0713 0.05670
100 0.3684 -0.0765 0.06740
I take observations at e.g. input = [0, 35, 50, 60, 80, 100] (0 being min and 100 being max; I take 6 samples in between min and max) and then I get corresponding a, b and c values (I understand that 6 sample points are a bad design of experiment so I will extend it in future).
I am trying to guess the value of a, b and c at say input = 19? Any pointers?
How to estimate goodness of fit in such scenario?
This is not 4D interpolation, this is 3 times 1D interpolation. You just interpolate interp1([0 35],[-0.1253 -0.0985],19) and the same for b and c. (interp1(intput,a,19))
Note that for the most basic 1D interpolation in a mesh grid (not what you have), you need 2 data points in general. For the most basic 2D interpolation, you need 4 data points. For 3D interpolation, 8 minimum, 4D, 16.... (2^d in general).
Also note that 1D interpolation uses 2 "dims". Because you use one to guide the interpolation, the other one is interpolated. General, with [v,a,b,c] data you would use 3D interpolation.
all that said, you do are nto in this case. You have scattered data, not a grid, thus the problem becomes considerably more complicated.
In case you can generate a few more points (not necessarily 16) you can use the function griddatan for interpolating scattered data. Note that you can not just say "give me [a,b,c] for input=19, there could be infinite amount of a,b,cs that have that condition. In any case, you always need to give dim-1 amount of sample points, and get the last one interpolated. Just an advice: this function is computationally and memory-wise very expensive. Do not use for big data points because it will crash your PC.
In the case you want to find a set of parameters that make input=19 then you are getting to more complicated area. You want to minimise a function f(x), where x=[a,b,c] for f(x)=input
In math terms:
argmin_x |f(x)-input|^2= \vec{input}
this is a harder problem and arguably more mathematics than a programming question. Perhaps a ND bspline fitting of your data would be a good f

Confusion in different HOG codes

I have downloaded three different HoG codes.
using the image of 64x128
1) using the matlab function:extractHOGFeatures,
[hog, vis] = extractHOGFeatures(img,'CellSize',[8 8]);
The size of hog is 3780.
How to calculate:
HOG feature length, N, is based on the image size and the function parameter values.
N = prod([BlocksPerImage, BlockSize, NumBins])
BlocksPerImage = floor((size(I)./CellSize – BlockSize)./(BlockSize – BlockOverlap) + 1)
2) the second HOG function is downloaded from here.
Same image is used
H = hog( double(rgb2gray(img)), 8, 9 );
% I - [mxn] color or grayscale input image (must have type double)
% sBin - [8] spatial bin size
% oBin - [9] number of orientation bins
The size of H is 3024
How to calculate:
H - [m/sBin-2 n/sBin-2 oBin*4] computed hog features
3) HoG code from vl_feat.
cellSize = 8;
hog = vl_hog(im2single(rgb2gray(img)), cellSize, 'verbose','variant', 'dalaltriggs') ;
vl_hog: image: [64 x 128 x 1]
vl_hog: descriptor: [8 x 16 x 36]
vl_hog: number of orientations: 9
vl_hog: bilinear orientation assignments: no
vl_hog: variant: DalalTriggs
vl_hog: input type: Image
the output is 4608.
Which one is correct?
All are correct. Thing is HOG feature extraction function default parameters vary with packages. (Eg - opencv, matlab, scikit-image etc). By parameters I mean, winsize, stride, blocksize, scale etc.
Usually HOG descriptor length is :
Length = Number of Blocks x Cells in each Block x Number of Bins in each Cell
Since all are correct, which one you may use can be answered in many ways.
You can experiment with different param values and choose the one that suits you. Since there is no fixed way to find right values, it would be helpful if you know how change in each parameters affect the result.
Cell-size : If you increase this, you may not capture small details.
Block-size : Again, large block with large cell size may not help you capture the small details. Also since large block means illumination variation can be more and due to gradient normalization step, lot of details will be lost. So choose accordingly.
Overlap/Stride: This again helps you capture more information about the image patch if you choose overlapping blocks. Usually it is set to half the blocksize.
You may have lot of information by choosing the values of the above params accordingly. But the descriptor length will become unnecessarily long.
Hope this helps :)

Matlab - Dilation function alternative

I'm looking through various online sources trying to learn some new stuff with matlab.
I can across a dilation function, shown below:
function rtn = dilation(in)
h =size(in,1);
l =size(in,2);
rtn = zeros(h,l,3);
rtn(:,:,1)=[in(2:h,:); in(h,:)];
rtn(:,:,2)=in;
rtn(:,:,3)=[in(1,:); in(1:h-1,:)];
rtn_two = max(rtn,[],3);
rtn(:,:,1)=[rtn_two(:,2:l), rtn_two(:,l)];
rtn(:,:,2)=rtn_two;
rtn(:,:,3)=[rtn_two(:,1), rtn_two(:,1:l-1)];
rtn = max(rtn,[],3);
The parameter it takes is: max(img,[],3) %where img is an image
I was wondering if anyone could shed some light on what this function appears to do and if there's a better (or less confusing way) to do it? Apart from a small wiki entry, I can't seem to find any documentation, hence asking for your help.
Could this be achieved with the imdilate function maybe?
What this is doing is creating two copies of the image shifted by one pixel up/down (with the last/first row duplicated to preserve size), then taking the max value of the 3 images at each point to create a vertically dilated image. Since the shifted copies and the original are layered in a 3-d matrix, max(img,[],3) 'flattens' the 3 layers along the 3rd dimension. It then repeats this column-wise for the horizontal part of the dilation.
For a trivial image:
00100
20000
00030
Step 1:
(:,:,1) (:,:,2) (:,:,3) max
20000 00100 00100 20100
00030 20000 00100 20130
00030 00030 20000 20030
Step 2:
(:,:,1) (:,:,2) (:,:,3) max
01000 20100 22010 22110
01300 20130 22013 22333
00300 20030 22003 22333
You're absolutely correct this would be simpler with the Image Processing Toolbox:
rtn = imdilate(in, ones(3));
With the original code, dilating by more than one pixel would require multiple iterations, and because it operates one dimension at a time it's limited to square (or possibly rectangular, with a bit of modification) structuring elements.
Your function replaces each element with the maximum value among the corresponding 3*3 kernel. By creating a 3D matrix, the function align each element with two of its shift, thus equivalently achieves the 3*3 kernel. Such alignment was done twice to find the maximum value along each column and row respectively.
You can generate a simple matrix to compare the result with imdilate:
a=magic(8)
rtn = dilation(a)
b=imdilate(a,ones(3))
Besides imdilate, you can also use
c=ordfilt2(a,9,ones(3))
to get the same result ( implements a 3-by-3 maximum filter. )
EDIT
You may have a try on 3D image with imdilate as well:
a(:,:,1)=magic(8);
a(:,:,2)=magic(8);
a(:,:,3)=magic(8);
mask = true(3,3,3);
mask(2,2,2) = false;
d = imdilate(a,mask);

Matlab - Modelling Image Formation Algorithm Implementation

I'm trying to implement an algorithm in matlab.
The algorithm (or stages) is as follows:
(1) Choose an illuminant.
(2) Calculate colour signals for all the 24 reflectances under that illuminant.
(3) Multiply, element-by-element, each sensor-response vector (columns or R (see variables)) by the colour signal.
(4) Sum the result over all wavelengths,(which should leave me with 72 values: 24 R values (one for each surface), 24 G values, and 24 B values)
(5) Create an image from the calculated sensor response for each reflectance by assigning a 100x100 pixel square and create a pattern of 4 rows and 6
columns (like a macbeth colourchecker).
I think I'm getting confused at stage 4 (but I might be implementing it wrong earlier)...
These are my variables:
A %an illuminance vector of size 31x1.
R %colour camera sensitivities of size 31x3. (the columns of this matrix are the red, green, and blue sensor response functions of a camera).
S %surface reflectances (24) of size 31x24 from a Macbeth ColourChecker (each column is a different reflectance function.
WAV %Reference wavelength values of size 31x1.
This is what I've implemented:
(1)choose A (as it's the only one I've made)
A;
(2)Calc colour signals for all 24 reflectances
cSig_1A = S(:,1).*A;
cSig_2A = S(:,2).*A;
.
. %all 24 columns of S
.
cSig_24A = S(:,24).*A;
(3)multiply sensor-response vector (R columns (RGB)) by colour signal:
% R.*reflectances G.*reflectances B.*reflectances
a1=R(:,1).*cSig_1A; a12=R(:,2).*cSig_1A; a13=R(:,3).* cSig_1A;
b1=R(:,1).*cSig_2A; b12=R(:,2).*cSig_2A; b13=R(:,3).* cSig_2A;
.
. %all 24 signals (think this is correct)
.
x1=R(:,1).*cSig_24A; x12=R(:,2).*cSig_24A; x13=R(:,3).*cSig_24A;
Assuming I've done the previous steps correct, I'm not sure how you sum the results for this over wavelengths to only have 72 values left? and then create an image from them.
Maybe the wording confuses me but if you guys could give me some guidance, that would be great. It's much appreciated. Thanks in advance.

modem.oqpskmod for BER

hi can anyone show how to use the modem.oqpskmod for BER. thanks!
h = modem.oqpskmod
y = modulate(h, values);
g = modem.oqpskdemod(h)
z = demodulate(g, y)
let's assume that i have array called values which contains only 1s and 0s.
my question is how would i calculate BER? of course if above my code is correct.
Based on this Wikipedia page, you simply have to compute the number of incorrect bits and divide by the total number of transferred bits to get the bit error rate (BER). If values is the unmodulated input signal and z is the output signal after modulation and demodulation, you can compute it like this:
BER = sum(logical(values(:)-z(:)))/numel(values);
EDIT: I modified the above code just in case you run into two situations:
If z has values other than 0 and 1.
If z is a different size than values (i.e. row vector versus column vector).
I don't know if you are ever likely to come across these two situations, but better safe than sorry. ;)