Is anyone familiar with .BMT files and their structure? - bmp

This might be in the wrong place here but I am trying to use a simple BMP Image in a software for thermography called IRSoft. Is anyone maybe familiar with the file type .BMT?
I don't really want to reverse engineer too much but maybe someone else has an idea.

In the status line of IRSoft you can see the resolution of your camera. In my case it is 160x120 pixels. My BMT-files have always a size of 230588 bytes, that means some 12 bytes per pixel...
It seems to me that the last 160*120*4=76800 bytes of the BMT-file represents the thermal image:
4 bytes for every pixel. At file offset 153788 I can find the upper left pixel followed by the rest of the upper line. At the last offset 230584 I can find the lower right pixel.
I don't know the meaning of the rest of the file. Perhaps the real image, reference temperatures...
Do you know how to calculate the temperature out of these values?
This table translates the 4 byte values approximately to temperatures in degrees Celsius:
I am afraid they differ to other files.
0x41d00000 and more: 26.0°C an more
0x41c00000 and more: 24.0°C an more
0x41b00000 and more: 22.0°C an more
0x41a00000 and more: 20.0°C an more
0x41900000 and more: 18.0°C an more

Related

How to render in a specific bit depth?

How can OfflineAudioContext.startRendering() output an AudioBuffer that contains the bit depth of my choice (16 bits or 24 bits)? I know that I can set the sample rate of the output easily with AudioContext.sampleRate, but how do I set the bit depth?
My understanding of audio processing is pretty limited, so perhaps it's not as easy as I think it is.
Edit #1:
Actually, AudioContext.sampleRate is readonly, so if you have an idea on how to set the sample rate of the output, that would be great too.
Edit #2:
I guess the sample rate is inserted after the number of channels in the encoded WAV (in the DataView)
You can't do this directly because WebAudio only works with floating-point values. You'll have to do this yourself. Basically take the output from the offline context and multiply every sample by 32768 (16-bit) or 8388608 (24-bit) and round to an integer. This assumes that the output from the context lies within the range of -1 to 1. If not, you'll have to do additional scaling. And finally, you might want to divide the final result by 32768 (8388608) to get floating-point numbers back. That depends on what the final application is.
For Edit #1, the answer is that when you construct the OfflineAudioContext, you have to specify the sample rate. Set that to the rate you want. Not sure what AudioContext.sampleRate has to do with this.
For Edit #2, there's not enough information to answer since you don't say what DataView is.

Performing Intra-frame Prediction in Matlab

I am trying to implement a hybrid video coding framework which is used in the H.264/MPEG-4 video standard for which I need to perform 'Intra-frame Prediction' and 'Inter Prediction' (which in other words is motion estimation) of a set of 30 frames for video processing in Matlab. I am working with Mother-daughter frames.
Please note that this post is very similar to my previously asked question but this one is solely based on Matlab computation.
Edit:
I am trying to implement the framework shown below:
My question is how to perform horizontal coding method which is one of the nine methods of Intra Coding framework? How are the pixels sampled?
What I find confusing is that Intra Prediction needs two inputs which are the 8x8 blocks of input frame and the 8x8 blocks of reconstructed frame. But what happens when coding the very first block of the input frame since there will be no reconstructed pixels to perform horizontal coding?
In the image above the whole system is a closed loop where do you start?
END:
Question 1: Is intra-predicted image only for the first image (I-frame) of the sequence or does it need to be computed for all 30 frames?
I know that there are five intra coding modes which are horizontal, vertical, DC, Left-up to right-down and right-up to left-down.
Question 2: How do I actually get around comparing the reconstructed frame and the anchor frame (original current frame)?
Question 3: Why do I need a search area? Can the individual 8x8 blocks be used as a search area done one pixel at a time?
I know that pixels from reconstructed block are used for comparing, but is it done one pixel at a time within the search area? If so wouldn't that be too time consuming if 30 frames are to be processed?
Continuing on from our previous post, let's answer one question at a time.
Question #1
Usually, you use one I-frame and denote this as the reference frame. Once you use this, for each 8 x 8 block that's in your reference frame, you take a look at the next frame and figure out where this 8 x 8 block best moved in this next frame. You describe this displacement as a motion vector and you construct a P-frame that consists of this information. This tells you where the 8 x 8 block from the reference frame best moved in this frame.
Now, the next question you may be asking is how many frames is it going to take before we decide to use another reference frame? This is entirely up to you, and you set this up in your decoder settings. For digital broadcast and DVD storage, it is recommended that you generate an I-frame every 0.5 seconds or so. Assuming 24 frames per second, this means that you would need to generate an I-frame every 12 frames. This Wikipedia article was where I got this reference.
As for the intra-coding modes, these tell the encoder in what direction you should look for when trying to find the best matching block. Actually, take a look at this paper that talks about the different prediction modes. Take a look at Figure 1, and it provides a very nice summary of the various prediction modes. In fact, there are nine all together. Also take a look at this Wikipedia article to get better pictorial representations of the different mechanisms of prediction as well. In order to get the best accuracy, they also do subpixel estimation at a 1/4 pixel accuracy by doing bilinear interpolation in between the pixels.
I'm not sure whether or not you need to implement just motion compensation with P-frames, or if you need B frames as well. I'm going to assume you'll be needing both. As such, take a look at this diagram I pulled off of Wikipedia:
Source: Wikipedia
This is a very common sequence for encoding frames in your video. It follows the format of:
IBBPBBPBBI...
There is a time axis at the bottom that tells you the sequence of frames that get sent to the decoder once you encode the frames. I-frames need to be encoded first, followed by P-frames, and then B-frames. A typical sequence of frames that are encoded in between the I-frames follow this format that you see in the figure. The chunk of frames in between I-frames is what is known as a Group of Pictures (GOP). If you remember from our previous post, B-frames use information from ahead and from behind its current position. As such, to summarize the timeline, this is what is usually done on the encoder side:
The I-frame is encoded, and then is used to predict the first P-frame
The first I-frame and the first P-frame are then used to predict the first and second B-frame that are in between these frames
The second P-frame is predicted using the first P-frame, and the third and fourth B-frames are created using information between the first P-frame and the second P-frame
Finally, the last frame in the GOP is an I-frame. This is encoded, then information between the second P-frame and the second I-frame (last frame) are used to generate the fifth and sixth B-frames
Therefore, what needs to happen is that you send I-frames first, then the P-frames, and then the B-frames after. The decoder has to wait for the P-frames before the B-frames can be reconstructed. However, this method of decoding is more robust because:
It minimizes the problem of possible uncovered areas.
P-frames and B-frames need less data than I-frames, so less data is transmitted.
However, B-frames will require more motion vectors, and so there will be some higher bit rates here.
Question #2
Honestly, what I have seen people do is do a simple Sum-of-Squared Differences between one frame and another to compare similarity. You take your colour components (whether it be RGB, YUV, etc.) for each pixel from one frame in one position, subtract these with the colour components in the same spatial location in the other frame, square each component and add them all together. You accumulate all of these differences for every location in your frame. The higher the value, the more dissimilar this is between the one frame and the next.
Another measure that is well known is called Structural Similarity where some statistical measures such as mean and variance are used to assess how similar two frames are.
There are a whole bunch of other video quality metrics that are used, and there are advantages and disadvantages when using any of them. Rather than telling you which one to use, I defer you to this Wikipedia article so you can decide which one to use for yourself depending on your application. This Wikipedia article describes a whole bunch of similarity and video quality metrics, and the buck doesn't stop there. There is still on-going research on what numerical measures best capture the similarity and quality between two frames.
Question #3
When searching for the best block from an I-frame that has moved in a P-frame, you need to restrict the searching to a finite sized windowed area from the location of this I-frame block because you don't want the encoder to search all of the locations in the frame. This would simply be too computationally intensive and would thus make your decoder slow. I actually mentioned this in our previous post.
Using one pixel to search for another pixel in the next frame is a very bad idea because of the minuscule amount of information that this single pixel contains. The reason why you compare blocks at a time when doing motion estimation is because usually, blocks of pixels have a lot of variation inside the blocks which are unique to the block itself. If we can find this same variation in another area in your next frame, then this is a very good candidate that this group of pixels moved together to this new block. Remember, we're assuming that the frame rate for video is adequately high enough so that most of the pixels in your frame either don't move at all, or move very slowly. Using blocks allows the matching to be somewhat more accurate.
Blocks are compared at a time, and the way blocks are compared is using one of those video similarity measures that I talked about in the Wikipedia article I referenced. You are certainly correct in that doing this for 30 frames would indeed be slow, but there are implementations that exist that are highly optimized to do the encoding very fast. One good example is FFMPEG. In fact, I use FFMPEG at work all the time. FFMPEG is highly customizable, and you can create an encoder / decoder that takes advantage of the architecture of your system. I have it set up so that encoding / decoding uses all of the cores on my machine (8 in total).
This doesn't really answer the actual block comparison itself. Actually, the H.264 standard has a bunch of prediction mechanisms in place so that you're not looking at all of the blocks in an I-frame to predict the next P-frame (or one P-frame to the next P-frame, etc.). This alludes to the different prediction modes in the Wikipedia article and in the paper that I referred you to. The encoder is intelligent enough to detect a pattern, and then generalize an area of your image where it believes that this will exhibit the same amount of motion. It skips this area and moves onto the next.
This assignment (in my opinion) is way too broad. There are so many intricacies in doing motion prediction / compensation that there is a reason why most video engineers already use available tools to do the work for us. Why invent the wheel when it has already been perfected, right?
I hope this has adequately answered your questions. I believe that I have given you more questions than answers really, but I hope that this is enough for you to delve into this topic further to achieve your overall goal.
Good luck!
Question 1: Is intra-predicted image only for the first image (I-frame) of the sequence or does it need to be computed for all 30 frames?
I know that there are five intra coding modes which are horizontal, vertical, DC, Left-up to right-down and right-up to left-down.
Answer: intra prediction need not be used for all the frames.
Question 2: How do I actually get around comparing the reconstructed frame and the anchor frame (original current frame)?
Question 3: Why do I need a search area? Can the individual 8x8 blocks be used as a search area done one pixel at a time?
Answer: we need to use the block matching algo for finding the motion vector. so search area is reqd. Normally size of the search area should be larger than the block size. larger the search area, more the computation and higher the accuracy.

Modifying Every Column of Every Frame of a Video

I would like to write a program that will take a video as input, create an output video file, and will (starting after a certain number of frames), begin writing modified frames to the output file frame by frame.
The modification will need to work on individual columns of pixels, one at a time.
Viewing this as a problem to be solved in Matlab, with each frame as a matrix... I cannot think of a way to make this computationally tractable.
I am hoping that someone might be able to offer suggestions on how I might begin to approach this problem.
Here are some details, in case it helps:
I'm interested in transforming a video in the following way:
Viewing a video as a sequence of (MxN) matrices, where each matrix is called a frame:
Take an input video and create new file for output video
For each column V in frame(i) of output video, replace this column by
column V in frame(i + V - N) of the input video.
For example: the new right-most column (column N) of frame(i) will contain column N of frame(i + N - N) = frame(i)... so that there is no replacement. The new 2nd to right-most column (column N-1) of frame(i) will contain column N-1 of [frame(i+N-1-N) = frame(i-1)].
In order to make this work (i.e. in order to not run out of previous frames), this column replacement will start on frame N of the video.
So... This is basically a variable delay running from left to right?
As you say, you do have two ways of going about this:
a) Use lots of memory
b) Use lots of file access
Your memory requirements increase as a cube power of the size of the video - the size of each frame increases, AND the number of previous frames you need to have open or reference increases. I.e. doubling frame size will require 4x memory per frame, and 2x number of frames open.
I think that Matlab's memory management will probably make this hard to do for e.g. a 1080p video, unless you have a pretty high-end workstation. Do you? A quick test-read of a 720p video gives 1.2MB per frame. 1080p would then be approx 5MB per frame, and you would need to have 1920 frames open: approx 10GB needed.
It will be more efficient to load frames individually, if you don't have enough memory - otherwise you will be using pagefiles and that'll be slower than loading frame-by-frame.
Your basic code reading each frame individually could be something like this:
VR=VideoReader('My_Input_Video_Filename.avi');
VW=VideoWriter('My_Output_Video_Filename.avi','MPEG-4');
NumInFrames=get(VR,'NumberOfFrames');
InWidth=get(VR,'Width');
InHeight=get(VR,'Height');
OutFrame=zeros(InHeight,InWidth,3,'uint8');
for (frame=InWidth+1:NumInFrames)
for (subindex=1:InWidth)
CData=read(VR,frame-subindex);
OutFrame(:,subindex,:)=CData(:,subindex,:);
end
writeVideo(VW,OutFrame);
end
This will probably be slow, and I haven't fully checked it works, but it does use a minimum amount of memory.
The best case for minimum file acess is probably using a ring buffer arrangement and the maximum amount of memory, which would look something like this:
VR=VideoReader('My_Input_Video_Filename.avi');
VW=VideoWriter('My_Output_Video_Filename.avi','MPEG-4');
NumInFrames=get(VR,'NumberOfFrames');
InWidth=get(VR,'Width');
InHeight=get(VR,'Height');
CDatas=read(VR,InWidth);
BufferIndex=1;
OutFrame=zeros(InHeight,InWidth,3,'uint8');
for (frame=InWidth+1:NumInFrames)
CDatas(:,:,:,BufferIndex)=read(VR,frame);
tempindices=circshift(1:InWidth,[1,-1*BufferIndex]);
for (subindex=1:InWidth)
OutFrame(:,subindex,:)=CDatas(:,subindex,:,tempindices(subindex));
end
writeVideo(VW,OutFrame);
BufferIndex=mod(BufferIndex+1,InWidth);
end
The buffer indexing code may need some tweaking there, but something along those lines would be a minimum file access, maximum memory use solution.
For a given PC with more or less memory, you can implement somewhere in between these two as a solution (i.e. reading somewhere between 1 and all frames per iteration) as a best-case.
Matlab will be quite slow for this kind of task, but it will be a good way of getting your algorithm right and working out indexing bugs and that kind of thing. Converting to a compiled language would give a good increase in speed - I converted a Matlab script to a C# program in a couple of hours, and gave a 10x increase in speed over an optimised script where the time taken was in the number of file reads.
Hope this helps, good luck!

How is it possible to encode black/white picture into ".wav"-file?

How is it possible to encode black/white picture into ".wav"-file? I know that it is possible for sure with help of "stenography". But I don't know it's algorithms. What algorithms exist? And what books/sources are the best for understanding of their principles?
Edited:
Actually I have stereo wav-file. My task is to decode pictures from it. The task says, that frequencies of the left channel show the X-coordinate, frequencies of the right channel show the Y-coordinate of Cartesian coordinate system. These points compose the picture with the text-message. So, I must to write programm for this. I haven't any idea what should I do.
Probably the simplest version of steganography using a wav file would be to use 16-bit samples in the wave file, but only dedicate the 15 most significant bits to sound. In the least significant bit of each sample, you'd encode one pixel of your black and white picture.
Regenerating the picture would require software to open the wave file, take the least significant bit from each sample, and put those bits back together with each other into (for example) a JPEG file.
To put things into perspective, a CD has two channels containing 16 bit samples at a rate of 44.1 KHz, so you'd only need the LSBs from around 10 seconds of sound to encode a fairly typical full-color JPEG (e.g., 100KB or so). A wave file of a typical ~3 minute pop song could hide around 15-20 full-color pictures pretty easily.
Edit: (to reply to edited answer). This is a little tougher to deal with. An individual sample can't represent any frequency; it just represents the amplitude at a given point in time. To get frequency, you need a number of samples over a period of time -- and you need to know the exact period to convert.
Once you know that, you basically do an FFT on the samples. That will tell you the relative strengths of signal at all possible frequencies. Presumably, you'd pick the strongest one and scale appropriately. Do the same for the other channel and draw a pixel at that point.
Your ears are not sensitive to small changes in sound file.
Wav files are UNCOMPRESSED data so its just a file of 16-24bit characters. Your ears cannot notice slight differences betweeen bits. All you need to do is periodically inject bit values that represent an image in the data.
So if you insert one pixel for every 1000 data points you can hide an image (without even encrypting it) in a wave file. If a user plays the file they CANNOT hear it.
When you save the file on your computer or computer afar you can use a decoding tool that is aware of the hiding techinque.

Mixing sound files of different size

I want to mix audio files of different size into a one single .wav file without clipping any file.,i.e. The resulting file size should be equal to the largest sized file of all.
There is a sample through which we can mix files of same size
[(http://www.modejong.com/iOS/#ex4 )(Example 4)].
I modified the code to get the mixed file as a .wav file.
But I am not able to understand that how to modify this code for unequal sized files.
If someone can help me out with some code snippet,i'll be really thankful.
It should be as easy as sending all the files to the mixer simultaneously. When any single file gets to the end, just treat it as if the remainder is filled with zeroes. When all files get to the end, you are done.
Note that the example code says it returns an error if there would be clipping (the sum of the waves is greater than the max representable value.). This condition is more likely if you are mixing multiple inputs. The best way around it is to create some "headroom" in the input waves. You can do either do this in preprocessing, by ensuring that each wave's volume is no more than X% of maximum. (~80-90%, depending on number of inputs.). The other way is to do it dynamically in the mixer code by multiplying each sample by some value <1.0 as you add it to the mix.
If you are selecting the waves to mix at runtime and failure due to clipping is unacceptable, you will need to modify the sample code to pin the values at max/min instead of returning an error. Don't just let them overflow or you will get noisy artifacts.
(Clipping creates artifacts as well, but when you haven't created enough headroom before mixing, it is definitely preferrable to overflow. It is a more familiar-sounding type of distortion, similar to what you get when you overdrive your speakers. See this wikipedia article on clipping:
Clipping is preferable to the alternative in digital systems—wrapping—which occurs if the digital hardware is allowed to "overflow", ignoring the most significant bits of the magnitude, and sometimes even the sign of the sample value, resulting in gross distortion of the signal.
How I'd do it:
Much like the mix_buffers function that you linked to, but pass in 2 parameters for mixbufferNumSamples. Iterate over the whole of the longer of the two buffers. When the index has gone beyond the end of the shorter buffer, simply set the sample from that buffer to 0 for the rest of the function.
If you must avoid clipping and do it in real-time and you know nothing else about the two sounds, you must provide enough headroom. The simplest method is by halving each of the samples before mixing:
mixed = s1/2 + s2/2;
This ensures that the resultant mixed sample won't overflow an int16_t. It will have the side effect of making everything quieter though.
If you can run it offline, you can calculate a scale factor to apply to both waveforms which will keep the peaks when summed below the maximum allowed value.
Or you could mix them all at full volume to an int32_t buffer, keeping track of the largest (magnitude) mixed sample and then go back through the buffer multiplying each sample by a scale factor which will make that extreme sample just reach the +32767/-32768 limits.