Morphology operations using Matlab [closed] - matlab

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Here is the problem:
A camera takes an image I of a penny, a dime, and a quarter lying on a white
background and the coins do not overlap. Suppose that thresholding creates a binary image B successfully
with 1 for the coin regions and 0 for the background.
You are given the known diameters the coins d_p, d_d, and d_q in pixels (note that d_d < d_p < d_q). How do I use morphology operations (dilation, erosion, opening, and
closing) and logical and set operations (AND, OR, NOT, and set difference), to produce three binary output images P, D, and Q, where P should contain just the penny, D should contain just the dime, and Q should contain just the quarter?
Can anyone give the codes or some hints? Thanks in advance!

This obviously looks like homework so I won't write any code for you, but I'll give you some hints to push you in the right direction. The situation you described is highly idealized and not reflective of real-world situations.... which is actually great as it makes coding a lot more simpler. I'm going to assume that the picture was taken directly above the surface with the coins and not on an angle.
You already know the diameters of each of the coins, and because the diameters are in pixels, this makes this problem a whole lot easier. As such, you would specify three structuring elements that are circular that have the same diameters for each of the coins.
First do a morphological opening on B using the largest structuring element, which is the quarter. Opening is an erosion, followed by a dilation. One thing you should know about erosion is that any objects that are smaller than the structuring element will disappear while those that are larger will have pixels in the object that remain. As such, by doing a closing, you would remove the penny and dime, while the quarter will be fully reconstructed. One good thing about opening is that if your structuring element is smaller than the object itself, doing an opening should keep the object the same, provided that the structuring element and object follow more or less the same characteristics. Because your structuring element is circular and so are the coins, we're good to go. As such, this is your first image Q.
Next, use the second largest structuring element, which is the penny, and do an opening on the image B. What will happen now is that the dime should disappear while the quarter and the penny should still remain. As such, do a set difference between this image and Q. Our result is just the dime that is left, and so this is P.
Finally for the dime, you actually don't even need to do any morphology. Do a logical OR operation to combine the quarter Q and penny P to get a combined image. After, do a set difference between the original image B and this combined image. You'll then isolate the dime, which is now D.
This should be enough to get you started. Good luck!

Related

N-Queens puzzle, but with all chess pieces

I want to solve a problem similar to N-Queens one, but:
all chess pieces are available
user inputs how many pieces of what kind are to be placed (e.g. 3 rooks, 4 knights, 1 bishop)
I'm lying on the floor for some time now, but can't come up with how to adjust the backtracking algorithm for this purpose. I will be very grateful for any kind of help.
In principle, the same approach as in the classical N-Queen problem should work:
Find an empty, non-attacked square where you can place your next piece
Search the new position recursively (or if you already placed all your pieces, output the solution)
Take back the last placed piece and repeat (goto step 1) until you have tried all squares where you can place the next piece
The only difference to the classical N-Queens problem is that the different pieces have different attack patterns. And some common optimizations might no longer work. For instance, if you have pawns, it breaks symmetry as they only attack the squares in front of them. (Though you still have one symmetry axis even with pawns.)
I would expect the backtracking algorithm to be more efficient if you start with placing the pieces first that cover the most squares: first queens, then rooks and bishops, then knights and kings, and finally pawns.

MATLAB: Using CONVN for moving average on Matrix

I'm looking for a bit of guidance on using CONVN to calculate moving averages in one dimension on a 3d matrix. I'm getting a little caught up on the flipping of the kernel under the hood and am hoping someone might be able to clarify the behaviour for me.
A similar post that still has me a bit confused is here:
CONVN example about flipping
The Problem:
I have daily river and weather flow data for a watershed at different source locations.
So the matrix is as so,
dim 1 (the rows) represent each site
dim 2 (the columns) represent the date
dim 3 (the pages) represent the different type of measurement (river height, flow, rainfall, etc.)
The goal is to try and use CONVN to take a 21 day moving average at each site, for each observation point for each variable.
As I understand it, I should just be able to use a a kernel such as:
ker = ones(1,21) ./ 21.;
mat = randn(150,365*10,4);
avgmat = convn(mat,ker,'valid');
I tried playing around and created another kernel which should also work (I think) and set ker2 as:
ker2 = [zeros(1,21); ker; zeros(1,21)];
avgmat2 = convn(mat,ker2,'valid');
The question:
The results don't quite match and I'm wondering if I have the dimensions incorrect here for the kernel. Any guidance is greatly appreciated.
Judging from the context of your question, you have a 3D matrix and you want to find the moving average of each row independently over all 3D slices. The code above should work (the first case). However, the valid flag returns a matrix whose size is valid in terms of the boundaries of the convolution. Take a look at the first point of the post that you linked to for more details.
Specifically, the first 21 entries for each row will be missing due to the valid flag. It's only when you get to the 22nd entry of each row does the convolution kernel become completely contained inside a row of the matrix and it's from that point where you get valid results (no pun intended). If you'd like to see these entries at the boundaries, then you'll need to use the 'same' flag if you want to maintain the same size matrix as the input or the 'full' flag (which is default) which gives you the size of the output starting from the most extreme outer edges, but bear in mind that the moving average will be done with a bunch of zeroes and so the first 21 entries wouldn't be what you expect anyway.
However, if I'm interpreting what you are asking, then the valid flag is what you want, but bear in mind that you will have 21 entries missing to accommodate for the edge cases. All in all, your code should work, but be careful on how you interpret the results.
BTW, you have a symmetric kernel, and so flipping should have no effect on the convolution output. What you have specified is a standard moving averaging kernel, and so convolution should work in finding the moving average as you expect.
Good luck!

Performing Intra-frame Prediction in Matlab

I am trying to implement a hybrid video coding framework which is used in the H.264/MPEG-4 video standard for which I need to perform 'Intra-frame Prediction' and 'Inter Prediction' (which in other words is motion estimation) of a set of 30 frames for video processing in Matlab. I am working with Mother-daughter frames.
Please note that this post is very similar to my previously asked question but this one is solely based on Matlab computation.
Edit:
I am trying to implement the framework shown below:
My question is how to perform horizontal coding method which is one of the nine methods of Intra Coding framework? How are the pixels sampled?
What I find confusing is that Intra Prediction needs two inputs which are the 8x8 blocks of input frame and the 8x8 blocks of reconstructed frame. But what happens when coding the very first block of the input frame since there will be no reconstructed pixels to perform horizontal coding?
In the image above the whole system is a closed loop where do you start?
END:
Question 1: Is intra-predicted image only for the first image (I-frame) of the sequence or does it need to be computed for all 30 frames?
I know that there are five intra coding modes which are horizontal, vertical, DC, Left-up to right-down and right-up to left-down.
Question 2: How do I actually get around comparing the reconstructed frame and the anchor frame (original current frame)?
Question 3: Why do I need a search area? Can the individual 8x8 blocks be used as a search area done one pixel at a time?
I know that pixels from reconstructed block are used for comparing, but is it done one pixel at a time within the search area? If so wouldn't that be too time consuming if 30 frames are to be processed?
Continuing on from our previous post, let's answer one question at a time.
Question #1
Usually, you use one I-frame and denote this as the reference frame. Once you use this, for each 8 x 8 block that's in your reference frame, you take a look at the next frame and figure out where this 8 x 8 block best moved in this next frame. You describe this displacement as a motion vector and you construct a P-frame that consists of this information. This tells you where the 8 x 8 block from the reference frame best moved in this frame.
Now, the next question you may be asking is how many frames is it going to take before we decide to use another reference frame? This is entirely up to you, and you set this up in your decoder settings. For digital broadcast and DVD storage, it is recommended that you generate an I-frame every 0.5 seconds or so. Assuming 24 frames per second, this means that you would need to generate an I-frame every 12 frames. This Wikipedia article was where I got this reference.
As for the intra-coding modes, these tell the encoder in what direction you should look for when trying to find the best matching block. Actually, take a look at this paper that talks about the different prediction modes. Take a look at Figure 1, and it provides a very nice summary of the various prediction modes. In fact, there are nine all together. Also take a look at this Wikipedia article to get better pictorial representations of the different mechanisms of prediction as well. In order to get the best accuracy, they also do subpixel estimation at a 1/4 pixel accuracy by doing bilinear interpolation in between the pixels.
I'm not sure whether or not you need to implement just motion compensation with P-frames, or if you need B frames as well. I'm going to assume you'll be needing both. As such, take a look at this diagram I pulled off of Wikipedia:
Source: Wikipedia
This is a very common sequence for encoding frames in your video. It follows the format of:
IBBPBBPBBI...
There is a time axis at the bottom that tells you the sequence of frames that get sent to the decoder once you encode the frames. I-frames need to be encoded first, followed by P-frames, and then B-frames. A typical sequence of frames that are encoded in between the I-frames follow this format that you see in the figure. The chunk of frames in between I-frames is what is known as a Group of Pictures (GOP). If you remember from our previous post, B-frames use information from ahead and from behind its current position. As such, to summarize the timeline, this is what is usually done on the encoder side:
The I-frame is encoded, and then is used to predict the first P-frame
The first I-frame and the first P-frame are then used to predict the first and second B-frame that are in between these frames
The second P-frame is predicted using the first P-frame, and the third and fourth B-frames are created using information between the first P-frame and the second P-frame
Finally, the last frame in the GOP is an I-frame. This is encoded, then information between the second P-frame and the second I-frame (last frame) are used to generate the fifth and sixth B-frames
Therefore, what needs to happen is that you send I-frames first, then the P-frames, and then the B-frames after. The decoder has to wait for the P-frames before the B-frames can be reconstructed. However, this method of decoding is more robust because:
It minimizes the problem of possible uncovered areas.
P-frames and B-frames need less data than I-frames, so less data is transmitted.
However, B-frames will require more motion vectors, and so there will be some higher bit rates here.
Question #2
Honestly, what I have seen people do is do a simple Sum-of-Squared Differences between one frame and another to compare similarity. You take your colour components (whether it be RGB, YUV, etc.) for each pixel from one frame in one position, subtract these with the colour components in the same spatial location in the other frame, square each component and add them all together. You accumulate all of these differences for every location in your frame. The higher the value, the more dissimilar this is between the one frame and the next.
Another measure that is well known is called Structural Similarity where some statistical measures such as mean and variance are used to assess how similar two frames are.
There are a whole bunch of other video quality metrics that are used, and there are advantages and disadvantages when using any of them. Rather than telling you which one to use, I defer you to this Wikipedia article so you can decide which one to use for yourself depending on your application. This Wikipedia article describes a whole bunch of similarity and video quality metrics, and the buck doesn't stop there. There is still on-going research on what numerical measures best capture the similarity and quality between two frames.
Question #3
When searching for the best block from an I-frame that has moved in a P-frame, you need to restrict the searching to a finite sized windowed area from the location of this I-frame block because you don't want the encoder to search all of the locations in the frame. This would simply be too computationally intensive and would thus make your decoder slow. I actually mentioned this in our previous post.
Using one pixel to search for another pixel in the next frame is a very bad idea because of the minuscule amount of information that this single pixel contains. The reason why you compare blocks at a time when doing motion estimation is because usually, blocks of pixels have a lot of variation inside the blocks which are unique to the block itself. If we can find this same variation in another area in your next frame, then this is a very good candidate that this group of pixels moved together to this new block. Remember, we're assuming that the frame rate for video is adequately high enough so that most of the pixels in your frame either don't move at all, or move very slowly. Using blocks allows the matching to be somewhat more accurate.
Blocks are compared at a time, and the way blocks are compared is using one of those video similarity measures that I talked about in the Wikipedia article I referenced. You are certainly correct in that doing this for 30 frames would indeed be slow, but there are implementations that exist that are highly optimized to do the encoding very fast. One good example is FFMPEG. In fact, I use FFMPEG at work all the time. FFMPEG is highly customizable, and you can create an encoder / decoder that takes advantage of the architecture of your system. I have it set up so that encoding / decoding uses all of the cores on my machine (8 in total).
This doesn't really answer the actual block comparison itself. Actually, the H.264 standard has a bunch of prediction mechanisms in place so that you're not looking at all of the blocks in an I-frame to predict the next P-frame (or one P-frame to the next P-frame, etc.). This alludes to the different prediction modes in the Wikipedia article and in the paper that I referred you to. The encoder is intelligent enough to detect a pattern, and then generalize an area of your image where it believes that this will exhibit the same amount of motion. It skips this area and moves onto the next.
This assignment (in my opinion) is way too broad. There are so many intricacies in doing motion prediction / compensation that there is a reason why most video engineers already use available tools to do the work for us. Why invent the wheel when it has already been perfected, right?
I hope this has adequately answered your questions. I believe that I have given you more questions than answers really, but I hope that this is enough for you to delve into this topic further to achieve your overall goal.
Good luck!
Question 1: Is intra-predicted image only for the first image (I-frame) of the sequence or does it need to be computed for all 30 frames?
I know that there are five intra coding modes which are horizontal, vertical, DC, Left-up to right-down and right-up to left-down.
Answer: intra prediction need not be used for all the frames.
Question 2: How do I actually get around comparing the reconstructed frame and the anchor frame (original current frame)?
Question 3: Why do I need a search area? Can the individual 8x8 blocks be used as a search area done one pixel at a time?
Answer: we need to use the block matching algo for finding the motion vector. so search area is reqd. Normally size of the search area should be larger than the block size. larger the search area, more the computation and higher the accuracy.

Graph/tree representation and recursion

I'm currently writing an optimization algorithm in MATLAB, at which I completely suck, therefore I could really use your help. I'm really struggling to find a good way of representing a graph (or well more like a tree with several roots) which would look more or less like this:
alt text http://img100.imageshack.us/img100/3232/graphe.png
Basically 11/12/13 are our roots (stage 0), 2x is stage1, 3x stage2 and 4x stage3. As you can see nodes from stageX are only connected to several nodes from stage(X+1) (so they don't have to be connected to all of them).
Important: each node has to hold several values (at least 3-4), one will be it's number and at least two other variables (which will be used to optimize the decisions).
I do have a simple representation using matrices but it's really hard to maintain, so I was wondering is there a good way to do it?
Second question: when I'm done with that representation I need to calculate how good each route (from roots to the end) is (like let's say I need to compare is 11-21-31-41 the best or is 11-21-31-42 better) to do that I will be using the variables that each node holds. But the values will have to be calculated recursively, let's say we start at 11 but to calcultate how good 11-21-31-41 is we first need to go to 41, do some calculations, go to 31, do some calculations, go to 21 do some calculations and then we can calculate 11 using all the previous calculations. Same with 11-21-31-42 (we start with 42 then 31->21->11). I need to check all the possible routes that way. And here's the question, how to do it? Maybe a BFS/DFS? But I'm not quite sure how to store all the results.
Those are some lengthy questions, but I hope I'm not asking you for doing my homework (as I got all the algorithms, it's just that I'm not really good at matlab and my teacher wouldn't let me to do it in java).
Granted, it may not be the most efficient solution, but if you have access to Matlab 2008+, you can define a node class to represent your graph.
The Matlab documentation has a nice example on linked lists, which you can use as a template.
Basically, a node would have a property 'linksTo', which points to the index of the node it links to, and a method to calculate the cost of each of the links (possibly with some additional property that describe each link). Then, all you need is a function that moves down each link, and brings the cost(s) with it when it moves back up.

Separation of singing voice from music [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to know how to perform "spectral change detection" for the classification of vocal & non-vocal segments of a song. We need to find the spectral changes from a spectrogram. Any elaborate information about this, particularly involving MATLAB?
Separating out distinct signals from audio is a very active area of research, and it is a very hard problem. This is often called Blind Signal Separation in the literature. (There is some MATLAB demo code in the previous link.
Of course, if you know that there is vocal in the music, you can use one of the many vocal separation algorithms.
As others have noted, solving this problem using only raw spectrum analysis is a dauntingly hard problem, and you're unlikely to find a good solution to it. At best, you might be able to extract some of the vocals and a few extra crossover frequencies from the mix.
However, if you can be more specific about the nature of the audio material you are working with here, you might be able to get a little bit further.
In the worst case, your material would be normal mp3's of regular songs -- ie, a full band + vocalist. I have a feeling that this is the case you are probably looking at given the nature of your question.
In the best case, you have access to the multitrack studio recordings and have at least a full mixdown and an instrumental track, in which case you could extract the vocal frequencies from the mix. You would do this by generating an impulse response from one of the tracks and applying it to the other.
In the middle case, you are dealing with simple music which you could apply some sort of algorithm tuned to the parameters of the music to. For instance, if you are dealing with electronic music, you can use to your advantage the stereo width of the track to eliminate all mono elements (ie, basslines + kicks) to extract the vocals + other panned instruments, and then apply some type of filtering and spectrum analysis from there.
In short, if you are planning on making an all-purpose algorithm to generate clean acapella cuts from arbitrary source material, you're probably biting off more than you can chew here. If you can specifically limit your source material, then you have a number of algorithms at your disposal depending on the nature of those sources.
This is hard. If you can do this reliably you will be an accomplished computer scientist. The most promising method I read about used the lyrics to generate a voice only track for comparison. Again, if you can do this and write a paper about it you will be famous (amongst computer scientists). Plus you could make a lot of money by automatically generating timings for karaoke.
If you just want to decide wether a block of music is clean a-capella or with instrumental background, you could probably do that by comparing the bandwidth of the signal to a normal human singer bandwidth. Also, you could check for the base frequency, which can only be in a pretty limited frequency range for human voices.
Still, it probably won't be easy. However, hearing aids do this all the time, so it is clearly doable. (Though they typically search for speech, not singing)
first sync the instrumental with the original, make sure they are the same length and bitrate and start and end at the exact time and convert them to .wav
then do something like
I = wavread(instrumental.wav);
N = wavread(normal.wav);
i = inv(I);
A = (N - i); // it could be A = (N * i) or A = (N + i) you'll have to play around
wavwrite(A, acapella.wav)
that should do it.. a little linear algebra goes a long way.