I am currently reading Fundamentals of Multimedia by Ze-Nian Li
In the book there is a sample problem which I can't quiet solve, even though I seem to understand entropy and arithmetic encoding/decoding.
Given a data stream that has been compressed to a length of 100,000 bits, and told that it is the result of running an “ideal” entropy coder on a sequence of data. You are also told that the
original data consists of samples of a continuous waveform, quantized to 2 bits per sample. The probabilities of the uncompressed values are as follows:
00 - 8/16
01 - 6/16
10 - 1/16
11 - 1/16
How would I figure out the approximate length of the uncompressed signal?
Since you "seem to understand entropy", you can simply use the formula for entropy to do your homework. Here is a hint from Boltzmann's tombstone:
Related
I am working on a project in implementing HEVC intra-prediction with MATLAB.
I have read so many articles to write the codes in MATLAB and finally I have done that.
(one of the most useful one was this article:
Intra Coding of the HEVC Standard)
The main purpose of the project is a comparison between HEVC and AVC intra-prediction to show HEVC will give better quality for the reconstructed image than AVC does and for this reason the final SAE (sum of absolute errors) which HEVC gives should be less than the one with AVC.
Unlike a HEVC decoder/encoder with a dynamic block division with respect to the amount of details in each area of an image, according to my supervisor, I have to have a specific block size at a time for each intra-prediction implementation on an image, i.e. once with 64x64, once with 32x32 and so on to 4x4.
Now I have a big problem in my work which is the SAE of HEVC is by far larger than AVC. I don't know why is it so?
If it's needed let me know to post my codes later.
Also I have some doubts and questions in implementing HEVC intra-prediction:
1- Does anything in the below linear interpolation function and its related parameters (according to the cited article) change with the block size or it is always the same for different block sizes?
Px,y = ((32 − wy )· Ri,0 + wy · Ri+1,0 + 16 ) >> 5
cy = (y · d) >> 5
wy = (y · d) &31
2- (>>) the shift operator, is like a normal division (for example >> 5 is equal to division of a signed number by 32) or is a binary shift of a signed number?
(I said signed number due to the negative displacement related to some angular modes. also it's noticeable that bit-wise shift of an unsigned number gives totally different result than a signed one)
3- For computing the cost of each mode, I used SAE (Sum of Absolute Errors) as a replace for the full cost function for the simplification.
C = DHad + λ · Rmode (HEVC cost function)
Do you think using SAE instead of HEVC cost function will affect the process of choosing best mode for each block? If it's so, do you have any other more accurate method than SAE as a replacement for HEVC cost function to choose the best mode of prediction for each pixel?
4- For comparison purposes between h.265 (HEVC) and h.264 (AVC) intra-prediction, the total SAE of a reconstructed picture by HEVC should be less than AVC. However, it is not the case in my results and the SAE of AVC is less than HEVC.
I cannot find the reason which caused this problem. may some one help me?
1 - Actually, the formula for the linear interpolation mentioned in this publication is not quite right. According to Setion 8.4.4.2.3 "Filtering process of neighbouring samples" of the H.265 standard, it should be:
Px,y = ((63 − wy )· Ri,0 + wy · Ri+1,0 + 32 ) >> 6
Look at the standard for more information. Regarding your question about adapting some of the numbers depending on the block size: This so-called "strong filtering" should only be applied for reference pixels of 32x32 intra blocks. For smaller blocks, only the "Reference Sample Smoothing" from your article can be used. again, check the same section in the standard if you want to know details.
2 - The shift operator denotes a bitshift of the absolute value in these cases. Be aware with bitshifting signed numbers in matlab, some functions shift the absolute value, some shift the K2 complement with considering the sign.
3 - Since your "project is a comparison between HEVC and AVC intra-prediction to show HEVC will give better quality", i guess it makes sense to just use SAE or the sum of squared errors (SSE). If you do some kind of quality/bitrate evaluation like in the HEVC cost function, you'd need to add a lot more than just intra prediction to you project in order to compare both standarts adequately, in my opinion.
4 - You are right, the result should be the other way round. Check your calculation of the SAE. Also check with the stated section in the standard whether you do the reference sample filtering correctly.
Other stuff:
1 - While you can have 64x64 inter prediction blocks in HEVC, you can only have up to 32x32 intra blocks.
2 - Take care when using integers in matlab, this also once screwed up all of my computations. Think about whether the number of bits of the integers you use are sufficient, or switch to doubles. When you load an image, the values are by default 8 bit unsigned integers, you have to typecast them for some computations.
My objective is to enhance 8 bit images to 16 bit ones. In other words, I want to increase the dynamic range of an 8 bit image. And to do that, I can sequentially take multiple images of 8 bit with fixed scene and fixed camera. To simplify the issue, let's assume they are grayscale images
Intuitively, I think I can achieve the goal by
Multiplying two 8 bit images
resimage = double(img1) .* double(img2)
Averaging specified number of 8 bit images
resImage = mean(images,3)
assuming images(:,:,i) contains ith 8 bit image.
After that, I can make the resulting image to 16 bit one.
resImage = uint16(resImage)
But before testing these methods, I wonder there is another way to do this - except for buying 16 bit camera, or literature for this subject might be better.
UPDATE: As comments below display, I got great information on drawbacks of simple averaging above and on image stacks for the enhancement. So it may be a good topic to study after all. Thank all for your great comments.
This question appears to relate to increasing the Dynamic Range of an image by integrating information from multiple 8 bit exposures into a 16 bit image. This is related to the practice of capturing and combining "image stacks" in astronomical imaging among other fields. An explanation of this practice and how it can both reduce image noise, and enhance dynamic range is available here:
http://keithwiley.com/astroPhotography/imageStacking.shtml
The idea is that successive captures of the same scene are subject to image noise, and this noise leads to stochastic variation of the pixel values captured. In the simplest case these variations can be leveraged by summing and dividing i.e. mean averaging the stack to improve its dynamic range but the practicality would depend very much on the noise characteristics of the camera.
You want to sum many images together, assuming there is no jitter and the camera is steady. Accumulate a large sum and then divide by some amount.
Note that to get a reasonable 16-bit image from an 8 bit source, you'd need to take hundreds of images to get any kind of reasonable result. Note that jitter will distort edge information and there is some inherent noise level of the camera that might mean you are essentially 'grinding metal'. In a practical sense, you might get 2 or 3 more bits of data from image summing, but not 8 more. To get 3 bits more would require at least 64 images (6 bits) to sum. Then divide by 8 (3 bits), as the lower bits are garbage.
Rule of thumb is to get a new bit of data, you need the squared(bits) of images, so 3 bits (8) means 64 images, 4 bits would be 256 images, etc.
Here's a link that talks about sampling:
http://electronicdesign.com/analog/understand-tradeoffs-increasing-resolution-averaging
"In fact, it can be shown that the improvement is proportional to the square root of the number of samples in the average."
Note that SNR is a log scale so equating it to bits is reasonable.
I'm trying to cluster a large (Gigabyte) dataset. In order to cluster, you need distance of every point to every other point, so you end up with a N^2 sized distance matrix, which in case of my dataset would be on the order of exabytes. Pdist in Matlab blows up instantly of course ;)
Is there a way to cluster subsets of the large data first, and then maybe do some merging of similar clusters?
I don't know if this helps any, but the data are fixed length binary strings, so I'm calculating their distances using Hamming distance (Distance=string1 XOR string2).
A simplified version of the nice method from
Tabei et al., Single versus Multiple Sorting in All Pairs Similarity Search,
say for pairs with Hammingdist 1:
sort all the bit strings on the first 32 bits
look at blocks of strings where the first 32 bits are all the same;
these blocks will be relatively small
pdist each of these blocks for Hammingdist( left 32 ) 0 + Hammingdist( the rest ) <= 1.
This misses the fraction of e.g. 32/128 of the nearby pairs which have
Hammingdist( left 32 ) 1 + Hammingdist( the rest ) 0.
If you really want these, repeat the above with "first 32" -> "last 32".
The method can be extended.
Take for example Hammingdist <= 2 on 4 32-bit words; the mismatches must split like one of
2000 0200 0020 0002 1100 1010 1001 0110 0101 0011,
so 2 of the words must be 0, sort the same.
(Btw, sketchsort-0.0.7.tar is 99 % src/boost/, build/, .svn/ .)
How about sorting them first? Maybe something like a modified merge sort? You could start with chunks of the dataset which will fit in memory to perform a normal sort.
Once you have the sorted data, clustering could be done iteratively. Maybe keep a rolling centroid of N-1 points and compare against the Nth point being read in. Then depending on your cluster distance threshold, you could pool it into the current cluster or start a new one.
The EM-tree and K-tree algorithms in the LMW-tree project can cluster problems this big and larger. Our most recent result is clustering 733 million web pages into 600,000 clusters. There is also a streaming variant of the EM-tree where the dataset is streamed from disk for each iteration.
Additionally, these algorithms can cluster bit strings directly where all cluster representatives and data points are bit strings, and the similarity measure that is used is Hamming distance. This minimizes the Hamming distance within each cluster found.
It's part of the process of OCR,which is :
How to segment the sentences into words,and then characters?
What's the candidate algorithm for this task?
As a first pass:
process the text into lines
process a line into segments (connected parts)
find the largest white band that can be placed between each pair of segments.
look at the sequence of widths and select "large" widths as white space.
everything between white space is a word.
Now all you need a a good enough definition of "large".
First, NIST (Nat'l Institutes of Standards and Tech.) published a protocol known as the NIST Form-Based Handwriting Recognition System about 15 years ago for the this exact question--i.e., extracting and preparing text-as-image data for input to machine learning algorithms for OCR. Members of this group at NIST also published a number of papers on this System.
The performance of their classifier was demonstrated by data also published with the algorithm (the "NIST Handwriting Sample Forms.")
Each of the half-dozen or so OCR data sets i have downloaded and used have referenced the data extraction/preparation protocol used by NIST to prepare the data for input to their algorithm. In particular, i am pretty sure this is the methodology relied on to prepare the Boston University Handwritten Digit Database, which is regarded as benchmark reference data for OCR.
So if the NIST protocol is not a genuine standard at least it's a proven methodology to prepare text-as-image for input to an OCR algorithm. I would suggest starting there, and using that protocol to prepare your data unless you have a good reason not to.
In sum, the NIST data was prepared by extracting 32-bit x 32 bit normalized bitmaps directly from a pre-printed form.
Here's an example:
00000000000001100111100000000000
00000000000111111111111111000000
00000000011111111111111111110000
00000000011111111111111111110000
00000000011111111101000001100000
00000000011111110000000000000000
00000000111100000000000000000000
00000001111100000000000000000000
00000001111100011110000000000000
00000001111100011111000000000000
00000001111111111111111000000000
00000001111111111111111000000000
00000001111111111111111110000000
00000001111111111111111100000000
00000001111111100011111110000000
00000001111110000001111110000000
00000001111100000000111110000000
00000001111000000000111110000000
00000000000000000000001111000000
00000000000000000000001111000000
00000000000000000000011110000000
00000000000000000000011110000000
00000000000000000000111110000000
00000000000000000001111100000000
00000000001110000001111100000000
00000000001110000011111100000000
00000000001111101111111000000000
00000000011111111111100000000000
00000000011111111111000000000000
00000000011111111110000000000000
00000000001111111000000000000000
00000000000010000000000000000000
I believe that the BU data-prep technique subsumes the NIST technique but added a few steps at the end, not with higher fidelity in mind but to reduce file size. In particular, the BU group:
began with the 32 x 32 bitmaps; then
divided each 32 x 32 bitmap into
non-overlapping blocks of 4x4;
Next, they counted the number of
activated pixels in each block ("1"
is activated; "0" is not);
the result is an 8 x 8 input matrix
in which each element is an integer (0-16)
for finding binary sequence like 101000000000000000010000001
detect sequence 0000,0001,001,01,1
I am assuming you are using the image-processing toolbox in matlab.
To distinguish text in an image. You might want to follow:
Grayscale (speeds up things greatly).
Contrast enhancement.
Erode the image lightly to remove noise (scratches/blips)
Dilation (heavy).
Edge-Detection ( or ROI calculation).
With Trial-and-error, you'll get the proper coefficients such that the image you obtain after 5th step will contain convex regions surrounding each letter/word/line/paragraph.
NOTE:
Essentially the more you dilate, the larger element you get. i.e. least dilation would be useful in identifying letters, whereas comparitively high dilation would be needed to identify lines and paragraphs.
Online ImgProc MATLAB docs
Check out the "Examples in Documentation" section in the online docs or refer to the image-processing toolbox documentation in Matlab Help menu.
The examples given there will guide you to the proper functions to call and their various formats.
Sample CODE (not mine)
I am simulating a digital filter, which is 4-stage.
Stages are:
CIC
half-band
OSR
128
Input is 4 bits and output is 24 bits. I am confused about the 24 bits output.
I use MATLAB to generate a 4 bits signed sinosoid input (using SD tool), and simulated with modelsim. So the output should be also a sinosoid. The issue is the output only contains 4 different data.
For 24 bits output, shouldn't we get a 2^24-1 different data?
What's the reason for this? Is it due to internal bit width?
I'm not familiar with Modelsim, and I don't understand the filter terminology you used, but...Are your filters linear systems? If so, an input at a given frequency will cause an output at the same frequency, though possibly different amplitude and phase. If your input signal is a single tone, sampled such that there are four values per cycle, the output will still have four values per cycle. Unless one of the stages performs sample rate conversion the system is behaving as expected. As as Donnie DeBoer pointed out, the word width of the calculation doesn't matter as long as it can represent the four values of the input.
Again, I am not familiar with the particulars of your system so if one of the stages does indeed perform sample rate conversion, this doesn't apply.
Forgive my lack of filter knowledge, but does one of the filter stages interpolate between the input values? If not, then you're only going to get a maximum of 2^4 output values (based on the input resolution), regardless of your output resolution. Just because you output to 24-bit doesn't mean you're going to have 2^24 values... imagine running a digital square wave into a D->A converter. You have all the output resolution in the world, but you still only have 2 values.
Its actually pretty simple:
Even though you have 4 bits of input, your filter coefficients may be more than 4 bits.
Every math stage you do adds bits. If you add two 4-bit values, the answer is a 5 bit number, so that adding 0xf and 0xf doesn't overflow. When you multiply two 4-bit values, you actually need 8 bits of output to hold the answer without the possibility of overflow. By the time all the math is done, your 4-bit input apparently needs 24-bits to hold the maximum possible output.