Bit-rate calculation in Jpeg2000 image compression - image-compression

I'm working on pre-processing image compression session. I'm researching on image compression methods and read lot of articles about jpeg2000 image compression. But I can't find clear resource about choosing bitrate for different layer of color image and bitrate calculation methods in jpeg2000.
Please give me some clues with official references on that.

There is not a single correct answer. The number of layers and the bit rates associated with those layers should be set according to the needs of the users and systems which will consume the imagery.
The good news is that there is plenty of guidance! Here are two options.
Copy Kakadu
Take a look at this gist. It shows the input parameters for Kakadu's kdu_compress utility. For the -rate flag it states:
When two rates are specified, the number of layers must be 2 or more
and intervening layers will be assigned roughly logarithmically spaced
bit-rates. When only one rate is specified, an internal heuristic
determines a lower bound and logarithmically spaces the layer rates
over the range.
Kakadu is heavily used in industry and their J2K compression software is highly trusted. So if they software uses logarithmic spacing for their quality layers that might be a reasonable place to start.
Pick a profile in the BIIF Profile for JPEG 2000
On the other hand, you can take a look at the BIIF Profile for JPEG 2000 which provides several compression profiles for different situations. The profiles provides examples giving the number of layers and bit rates. That said, there is a general message across all the profiles such as:
D.4.7 (NPJE profile) of BIIF:
Some systems may change the exact bitrates and number of layers to
meet application requirements or quality requirements.
F.7 (TPJE profile) of BIIF:
Note that the actual target bit rates and numbers of Quality Layers
used for a specific sensor would need to be optimized to accommodate
any unique properties of that particular imagery collection system.
G.2.1.2 (LPJE profile) of BIIF:
Applications need a sufficient number of codestream layers for optimal
bandwidth and memory management across such a wide range of
resolutions. Layers provide an elegant means to control visual quality
and manage channel capacity for streaming compressed imagery over low
bandwidth links... LPJE does not require a specific layer structure
since the goal of this preferred encoding is to accommodate hardware
and software implementations.

Related

What is the purpose of `kCGImageSourceShouldAllowFloat` for Image I/O?

When you use Image I/O on macOS, there's an option kCGImageSourceShouldAllowFloat which is documented as follows:
Whether the image should be returned as a CGImage object that uses floating-point values, if supported by the file format. CGImage objects that use extended-range floating-point values may require additional processing to render in a pleasing manner.
But it doesn’t say what file formats support it or what the benefits are, just that it might be slower.
Does anyone know what file formats support this and what the benefits would be?
TIFF files support floating point values. For example, the 128 bits per pixel format accepts 32-bit float components. See About Bitmap Images and Image Masks. Also see Supported Pixel Formats for table of supported pixel formats for graphics contexts.
In terms of the benefits of floating point, 32 bits per channel, it just means that you have more possible gradations of colors per channel. In general you can’t see this with the naked eye (over 16 bits per channel), but if you start applying adjustments (traditionally, multiple curves or levels adjustments) it means that you’re less likely to experience posterization of the images. So, if (a) the image already has this level of information; and (b) you’re might need to perform these sorts of adjustments to images, then the added data of 32-bits per component might have benefits. Otherwise the benefits of this amount of information is somewhat limited.
Bottom line, use floating point if you are possibly editing assets that might already have floating point components. But often we don’t need or use this level of information. Most of the JPG and PNG assets we deal with are 8 bits per component, anyway.

What kind of features are extracted with the AlexNet layers?

Question is regarding this method, which extracts features from the FC7 layer of AlexNet.
What kind of features is it actually extracting?
I used this method on images of paintings done by two artists. The training set is about 150 training images from each artist (so that the features are stored in a 300x4096 matrix); the validation set is 40 images. This works really well, 85-90% correct classification. I would like to know why it works so well.
WHAT FEATURES ?
FC8 is the classification layer; FC7 is the one before it, where all of the prior kernel pixels are linearised and concatenated. These represent the abstract, top-level features that the model training has "discovered". To examine these features, try one of the many layer visualization tools available on line (don't ask for references here; SO bans requests for resources).
The features vary from one training to another, depending on the kernel initialization (usually random) and very dependent on the training set. However, the features tend to be simple in the early layers, with greater variety and detail in the later ones. For instance, on the original AlexNet target (ILSVRC 2012, a.k.a. ImageNet data set), the FC7 features often include vehicle tires, animal faces, various types of flower petals, green leaves and stems, two-legged animal torsos, airplane sections, car/truck/bus grill work, etc.
Does that help?
WHY DOES IT WORK SO WELL ?
That depends on the data set and training parameters. How different are the images from the artists? There are plenty of features to extract: choice of subject, palette, compositional complexity, hard/soft edges, even direction of brush strokes. For instance, differentiating any two early cubists could be a little tricky; telling Rembrandt from Jackson Pollack should hit 100%.

Deep-learning for mapping large binary input

this question may come as being too broad, but I will try to make every sub-topic to be as specific as possible.
My setting:
Large binary input (2-4 KB per sample) (no images)
Large binary output of the same size
My target: Using Deep Learning to find a mapping function from my binary input to the binary output.
I have already generated a large training set (> 1'000'000 samples), and can easily generate more.
In my (admittedly limited) knowledge of Neural networks and deep learning, my plan was to build a network with 2000 or 4000 input nodes, the same number of output nodes and try different amounts of hidden layers.
Then train the network on my data set (waiting several weeks if necessary), and checking whether there is a correlation between in- and output.
Would it be better to input my binary data as single bits into the net, or as larger entities (like 16 bits at a time, etc)?
For bit-by-bit input:
I have tried "Neural Designer", but the software crashes when I try to load my data set (even on small ones with 6 rows), and I had to edit the project save files to set Input and Target properties. And then it crashes again.
I have tried OpenNN, but it tries to allocate a matrix of size (hidden_layers * input nodes) ^ 2, which, of course, fails (sorry, no 117GB of RAM available).
Is there a suitable open-source framework available for this kind of
binary mapping function regression? Do I have to implement my own?
Is Deep learning the right approach?
Has anyone experience with these kind of tasks?
Sadly, I could not find any papers on deep learning + binary mapping.
I will gladly add further information, if requested.
Thank you for providing guidance to a noob.
You have a dataset containing pairs of binary valued vectors, with a max length of 4,000 bits. You want to create a mapping function between the pairs. On the surface, that doesn't seem unreasonable - imagine a 64x64 image with binary pixels – this only contains 4,096 bits of data and is well within the reach of modern neural networks.
As your dealing with binary values, then a multi-layered Restricted Boltzmann Machine would seem like a good choice. How many layers you add to the network really depends on the level of abstraction in the data.
You don’t mention the source of the data, but I assume you expect there to be a decent correlation. Assuming the location of each bit is arbitrary and is independent of its near neighbours, I would rule out a convolutional neural network.
A good open source framework to experiment with is Torch - a scientific computing framework with wide support for machine learning algorithms. It has the added benefit of utilising your GPU to speed up processing thanks to its CUDA implementation. This would hopefully avoid you waiting several weeks for a result.
If you provide more background, then maybe we can home in on a solution…

Image based steganography that survives resizing?

I am using a startech capture card for capturing video from the source machine..I have encoded that video using matlab so every frame of that video will contain that marker...I run that video on the source computer(HDMI out) connected via HDMI to my computer(HDMI IN) once i capture the frame as bitmap(1920*1080) i re-size it to 1280*720 i send it for processing , the processing code checks every pixel for that marker.
The issue is my capture card is able to capture only at 1920*1080 where as the video is of 1280*720. Hence in order to retain the marker I am down scaling the frame captured to 1280*720 which in turn alters the entire pixel array I believe and hence I am not able to retain marker I fed in to the video.
In that capturing process the image is going through up-scaling which in turn changes the pixel values.
I am going through few research papers on Steganography but it hasn't helped so far. Is there any technique that could survive image resizing and I could retain pixel values.
Any suggestions or pointers will be really appreciated.
My advice is to start with searching for an alternative software that doesn't rescale, compress or otherwise modify any extracted frames before handing them to your control. It may save you many headaches and days worth of time. If you insist on implementing, or are forced to implement a steganography algorithm that survives resizing, keep on reading.
I can't provide a specific solution because there are many ways this can be (possibly) achieved and they are complex. However, I'll describe the ingredients a solution will most likely involve and your limitations with such an approach.
Resizing a cover image is considered an attack as an attempt to destroy the secret. Other such examples include lossy compression, noise, cropping, rotation and smoothing. Robust steganography is the medicine for that, but it isn't all powerful; it may be able to provide resistance to only specific types attacks and/or only small scale attacks at that. You need to find or design an algorithm that suits your needs.
For example, let's take a simple pixel lsb substitution algorithm. It modifies the lsb of a pixel to be the same as the bit you want to embed. Now consider an attack where someone randomly applies a pixel change of -1 25% of the time, 0 50% of the time and +1 25% of the time. Effectively, half of the time it will flip your embedded bit, but you don't know which ones are affected. This makes extraction impossible. However, you can alter your embedding algorithm to be resistant against this type of attack. You know the absolute value of the maximum change is 1. If you embed your secret bit, s, in the 3rd lsb, along with setting the last 2 lsbs to 01, you guarantee to survive the attack. More specifically, you get xxxxxs01 in binary for 8 bits.
Let's examine what we have sacrificed in order to survive such an attack. Assuming our embedding bit and the lsbs that can be modified all have uniform probabilities, the probability of changing the original pixel value with the simple algorithm is
change | probability
-------+------------
0 | 1/2
1 | 1/2
and with the more robust algorithm
change | probability
-------+------------
0 | 1/8
1 | 1/4
2 | 3/16
3 | 1/8
4 | 1/8
5 | 1/8
6 | 1/16
That's going to affect our PSNR quite a bit if we embed a lot of information. But we can do a bit better than that if we employ the optimal pixel adjustment method. This algorithm minimises the Euclidean distance between the original value and the modified one. In simpler terms, it minimises the absolute difference. For example, assume you have a pixel with binary value xxxx0111 and you want to embed a 0. This means you have to make the last 3 lsbs 001. With a naive substitution, you get xxxx0001, which has a distance of 6 from the original value. But xxx1001 has only 2.
Now, let's assume that the attack can induce a change of 0 33.3% of the time, 1 33.3% of the time and 2 33.3%. Of that last 33.3%, half the time it will be -2 and the other half it will be +2. The algorithm we described above can actually survive a +2 modification, but not a -2. So 16.6% of the time our embedded bit will be flipped. But now we introduce error correcting codes. If we apply such a code that has the potential to correct on average 1 error every 6 bits, we are capable of successfully extracting our secret despite the attack altering it.
Error correction generally works by adding some sort of redundancy. So even if part of our bit stream is destroyed, we can refer to that redundancy to retrieve the original information. Naturally, the more redundancy you add, the better the error correction rate, but you may have to double the redundancy just to improve the correction rate by a few percent (just arbitrary numbers here).
Let's appreciate here how much information you can hide in a 1280x720 (grayscale) image. 1 bit per pixel, for 8 bits per letter, for ~5 letters per word and you can hide 20k words. That's a respectable portion of an average novel. It's enough to hide your stellar Masters dissertation, which you even published, in your graduation photo. But with a 4 bit redundancy per 1 bit of actual information, you're only looking at hiding that boring essay you wrote once, which didn't even get the best mark in the class.
There are other ways you can embed your information. For example, specific methods in the frequency domain can be more resistant to pixel modifications. The downside of such methods are an increased complexity in coding the algorithm and reduced hiding capacity. That's because some frequency coefficients are resistant to changes but make embedding modifications easily detectable, then there are those that are fragile to changes but they are hard to detect and some lie in the middle of all of this. So you compromise and use only a fraction of the available coefficients. Popular frequency transforms used in steganography are the Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT).
In summary, if you want a robust algorithm, the consistent themes that emerge are sacrificing capacity and applying stronger distortions to your cover medium. There have been quite a few studies done on robust steganography for watermarks. That's because you want your watermark to survive any attacks so you can prove ownership of the content and watermarks tend to be very small, e.g. a 64x64 binary image icon (that's only 4096 bits). Even then, some algorithms are robust enough to recover the watermark almost intact, say 70-90%, so that it's still comparable to the original watermark. In some case, this is considered good enough. You'd require an even more robust algorithm (bigger sacrifices) if you want a lossless retrieval of your secret data 100% of the time.
If you want such an algorithm, you want to comb the literature for one and test any possible candidates to see if they meet your needs. But don't expect anything that takes only 15 lines to code and 10 minutes of reading to understand. Here is a paper that looks like a good start: Mali et al. (2012). Robust and secured image-adaptive data hiding. Digital Signal Processing, 22(2), 314-323. Unfortunately, the paper is not open domain and you will either need a subscription, or academic access in order to read it. But then again, that's true for most of the papers out there. You said you've read some papers already and in previous questions you've stated you're working on a college project, so access for you may be likely.
For this specific paper, table 4 shows the results of resisting a resizing attack and section 4.4 discusses the results. They don't explicitly state 100% recovery, but only a faithful reproduction. Also notice that the attacks have been of the scale 5-20% resizing and that only allows for a few thousand embedding bits. Finally, the resizing method (nearest neighbour, cubic, etc) matters a lot in surviving the attack.
I have designed and implemented ChromaShift: https://www.facebook.com/ChromaShift/
If done right, steganography can resiliently (i.e. robustly) encode identifying information (e.g. downloader user id) in the image medium while keeping it essentially perceptually unmodified. Compared to watermarks, steganography is a subtler yet more powerful way of encoding information in images.
The information is dynamically multiplexed into the Cb Cr fabric of the JPEG by chroma-shifting pixels to a configurable small bump value. As the human eye is more sensitive to luminance changes than to chrominance changes, chroma-shifting is virtually imperceptible while providing a way to encode arbitrary information in the image. The ChromaShift engine does both watermarking and pure steganography. Both DRM subsystems are configurable via a rich set of of options.
The solution is developed in C, for the Linux platform, and uses SWIG to compile into a PHP loadable module. It can therefore be accessed by PHP scripts while providing the speed of a natively compiled program.

Transferring data using ultrasound

Yamaha InfoSound and ShopKick application use technologies that allow to transfer data using ultrasound. That is playing an inaudible signal (>18kHz) that can be picked up by modern mobile phones (iOS, Android).
What is the approach used in such technologies? What kind of modulation they use?
I see several problems with this approach. First, 18kHz is not inaudible. Many people cannot hear it, especially as they age, but I know I certainly can (I do regular hearing tests, work-related). Also, most phones have different low-pass filters on their A/D converters, and many devices, especially older Android ones (I've personally seen that happen), filter everything below 16 kHz or so. Your app therefore is not guaranteed to work on any hardware. The iPhone should probably be able to do it.
In terms of modulation, it could be anything really, but I would definitely rule out AM. Sound has next to zero robustness when it comes to volume. If I were to implement something like that, I would go with FSK. I would think that PSK would fail due to acoustic reflections and such. The difficulty is that you're working with non-robust energy transfer within a very narrow bandwidth. I certainly do not doubt that it can be achieved, but I don't see something like this proving reliable. Just IMHO, that is.
Update: Now that i think about it, a plain on-off would work with a single tone if you're not transferring any data, just some short signals.
Can't say for Yamaha InfoSound and ShopKick, but what we used in our project was a variation of frequency modulation: the frequency of the carrier is modulated by a digital binary signal, where 0 and 1 correspond to 17 kHz and 18 kHz respectively. As for demodulator, we tried heterodyne. More details you could find here: http://rnd.azoft.com/mobile-app-transering-data-using-ultrasound/
There's nothing special in being ultrasound, the principle is the same as data transmission through a modem, so any digital modulation is -in principle- feasible. You only have a specific frequency band (above 18khz) and some practical requisites (the medium is very unreliable, I guess) that suggest to use a simple-robust scheme with low-bit rate.
I don't know how they do it but this is how I do it:
If it is a string then make sure it's not a long one (the longer the higher is the error probability ). Lets assume we're working with the vital part of the ASCII code, namely up to character number 127, then all you need is 7 bits per character. Transform this character into bits and modulate those bits using QFSK (there are several modulations to choose from, frequency shift based ones have turned out to be the most robust I've tried from the conventional ones... I've created my own modulation scheme for this use case). Select the carrier frequencies as 18.5,19,19.5, and 20 kHz (if you want to be mathematically strict in your design, select frequency values that assure you both orthogonality and phase continuity at symbol transitions, if you can't, a good workaround to avoid abrupt symbols transitions is to multiply your symbols by a window of the same size, eg. a Gaussian or Bartlet ). In my experience you can move this values in the range from 17.5 to 20.5 kHz (if you go lower it will start to bother people using your app, if you go higher the average type microphone frequency response will attenuate your transmission and induce unwanted errors).
On the receiver side implement a correlation or matched filter receiver (an FFT receiver works as well, specially a zero padded one but it might be a little bit slower, I wouldn't recommend Goertzel because frequency shift due to Doppler effect or speaker-microphone non-linearities could affect your reception). Once you have received the bit stream make characters with them and you will recover your message
If you face too many broadcasting errors, try selecting a higher amount of samples per symbol or band-pass filtering each frequency value before giving them to the demodulator, using an error correction code such as BCH or Reed Solomon is sometimes the only way to assure an error free communication.
One topic everybody always forgets to talk about is synchronization (to know on the receiver side when the transmission has begun), you have to be creative here and make a lot a tests with a lot of phones before you can derive an actual detection threshold that works on all, notice that this might also be distance dependent
If you are unfamiliar with these subjects I would recommend a couple of great books:
Digital Modulation Techniques from Fuqin Xiong
DIGITAL COMMUNICATIONS Fundamentals and Applications from BERNARD SKLAR
Digital Communications from John G. Proakis
You might have luck with a library I created for sound-based modems, libquiet. It gives you a handful of profiles to work from, including a slow "Ultrasonic whisper" profile with spectral content above 19kHz. The library is written in C but would require some work to interface with iOS.