How to make sense (handle) when computes logarithm of zero in prior information - matlab

I am working in image classification. I am using an information that called prior probability (in Bayesian rule). It has range in [0,1]. And it requires computing in logarithm. However, as you know, logarithm of zero number is Inf.
For example, given an pixel x in image I (size 3 by 3) with an cost function such as
Cost(x)=30+log(prior(x))
where prior is an matrix 3 by 3
prior=[ 0 0 0.5;
1 1 0.2;
0.4 0 0]
I =[ 1 2 3;
4 5 6;
7 8 9]
I want to compute cost of x=1 then
cost(x=1)=30+log(0)
Now, log(0) is Inf. Then result cost(x=1) also Inf. Based on my assumption that prior=0 that mean the given pixel belongs to background, and prior=1 that mean the given pixel belongs to foreground.
My question is that how to compute log(prior) satisfy my assumption.
I am using Matlab to do it. I think that log(0) becomes very small negative value. And I just set it is -9 as my code
%% Handle with log(0)
prior(prior==0.0) = NaN;
%% Compute log
log_prior=log(prior);
%% Assume that e^-9 very near 0.
log_prior(isnan(log_prior)) = -9;
UPDATE: To make clearly what I am doing. Let see the Bayesian rule. My task is that how to assign an given pixel x belongs to Background (BG) or Foreground (FG). It will depends on the probability
P(x∈BG|x)=P(x|x∈BG)P(x∈BG)/P(x)
In which P(x|x∈BG) is likelihood function and assume that it is approximated by Gaussian distribution, P(x∈BG) is prior term and P(x) can be ignore due to it is const
Using Maximum-a-Posteriori (MAP) Estimation we can map the above equation in to log space (to resolve exponential in Gaussian function)
Cost(x)=log(P(x∈BG|x))=log(P(x|x∈BG))+log(P(x∈BG))
To make simple, let assume log(P(x|x∈BG))=30, log(P(x∈BG)) is log(prior) then my cost function can rewritten as
Cost(x)=30+log(prior(x))
Now problem is that prior is within [0,1] then it logarithm is -Inf. As the chepner said, we can add eps value as
log(prior+eps)
However, log(eps) is very a lager negative number. It will be affected my cost function (also becomes very large negative number). Then the first term in my cost function (30) becomes not necessary. Based on my assumption that log(x)=1 then the pixel x will be BG and prior(x)=1 will be FG. How to make handle with my log(prior) when I compute my cost function?

The correct thing to do, before fiddling with Matlab, is to try to understand your problem. Ask yourself "what does it mean for the prior probability to vanish?". The answer is given by Bayes theorem, one form of which is:
posterior = likelihood * prior / normalization
So places where the prior is nil are, by definition, places where you are certain that your events (the things whose probabilities you are computing) cannot happen, regardless of their apparent likelihood (i.e. "cost"). So they are not interesting for you. You just recognize that and skip them.

Related

Dividing a normal distribution into regions of equal probability in Matlab

Consider a Normal distribution with mean 0 and standard deviation 1. I would like to divide this distribution into 9 regions of equal probability and take a random sample from each region.
It sounds like you want to find the values that divide the area under the probability distribution function into segments of equal probability. This can be done in matlab by applying the norminv function.
In your particular case:
segmentBounds = norminv(linspace(0,1,10),0,1)
Any two adjacent values of segmentBounds now describe the boundaries of segments of the Normal probability distribution function such that each segment contains one ninth of the total probability.
I'm not sure exactly what you mean by taking random numbers from each sample. One approach is to sample from each region by performing rejection sampling. In short, for each region bounded by x0 and x1, draw a sample from y = normrnd(0,1). If x0 < y < x1, keep it. Else discard it and repeat.
It's also possible that you intend to sample from these regions uniformly. To do this you can try rand(1)*(x1-x0) + x0. This will produce problems for the extreme quantiles, however, since the regions extend to +/- infinity.

Matlab create array random samples Gaussian distribution

I'd like to make an array of random samples from a Gaussian distrubution.
Mean value is 0 and variance is 1.
If I take enough samples, I would think my maximum value of a sample can be 0+1=1.
However, I find that I get values like 4.2891 ...
My code:
x = 0+sqrt(1)*randn(100000,1);
mean(x)
var(x)
max(x)
This would give me a mean like 0, a var of 0.9937 but my maximum value is 4.2891?
Can anyone help me out why it does this?
As others have mentioned, there is no bound on the possible values that x can take on in a gaussian distribution. However, the farther x is from the mean, the less likely it is to be drawn.
To give you some intuition for what the variance actually means (for any distribution, not just the gaussian case), you can look at the 68-95-99.7 rule. The rule says:
about 68% of the population will be within one sigma of the mean
about 95% of the population will be within two sigma's of the mean
about 99.7% of the population will be within three sigma's of the mean
Here sigma = sqrt(var) is the standard deviation of the distribution.
So while in theory it is possible to draw any x from a gaussian distribution, in practice it is unlikely to draw anything past 5 or 6 standard deviations away for a population of 100000.
This will yield N random numbers using the gaussian normal distribution.
N = 100;
mu = 0;
sigma = 1;
Xs = normrnd(mu, sigma, N);
EDIT:
I just realized that your code is in fact equivalent to what I've written.
As others already pointed out: variance is not the maximum distance a sample can deviate from the mean! (It is just the average of the squares of those distances)

Solving equations involving dozens of ceil and floor functions in MATLAB?

I am tackling a problem which uses lots of equations in the form of:
where q_i(x) is the only unknown, c_i, C_j, P_j are always positive. We have two cases, the first when c_i, C_j, P_j are integers and the case when they are real. C_j < P_j for all j
How is this type of problems efficently solved in MATLAB especially when the number of iterations N is between 20 - 100?
What I was doing is q_i(x) - c_i(x) must be equal to the summation of integers. So i was doing an exhaustive search for q_i(x) which satisfies both ends of the equation. Clearly this is computationally exhaustive.
What if c_i(x) is a floating point number, this will even make the problem even more difficult to find a real q_i(x)?
MORE INFO: These equations are from the paper "Integrating Preemption Threshold to Fixed Priority DVS Scheduling Algorithms" by Yang and Lin.
Thanks
You can use bisection method to numerically find zeros of almost any well-behavior functions.
Convert your equation problem into a zero-finding problem, by moving all things to one side of the equal sign. Then find x: f(x)=0.
Apply bisection method equation solver.
That's it! Or may be....
If you have specific range(s) where the roots should fall in, then just perform bisection method for each range. If not, you still have to give a maximum estimation (you don't want to try some number larger than that), and make this as the range.
The problem of this method is for each given range it can only find one root, because it's always picking the left (or right) half of the range. That's OK if P_j is integer, as you can always find a minimum step of the function. Say P_j = 1, then only a change in q_i larger than 1 leads to another segment (and thus a possible different root). Otherwise, within each range shorter than 1 there will be at most one solution.
If P_j is an arbitrary number (such as 1e-10), unless you have a lower limit on P_j, most likely you are out of lucky, since you can't tell how fast the function will jump, which essentially means f(x) is not a well-behavior function, making it hard to solve.
The sum is a step function. You can discretize the problem by calculating where the floor function jumps for the next value; this is periodic for every j. Then you overlay the N ''rhythms'' (each has its own speed specified by the Pj) and get all the locations where the sum jumps. Each segment can have exactly 0 or 1 intersection with qi(x). You should visualize the problem for intuitive understanding like this:
f = #(q) 2 + (floor(q/3)*0.5 + floor(q/4)*3 + floor(q/2)*.3);
xx = -10:0.01:10;
plot(xx,f(xx),xx,xx)
For each step, it can be checked analytically if an intersection exists or not.
jumps = unique([0:3:10,0:4:10,0:2:10]); % Vector with position of jumps
lBounds = jumps(1:end-1); % Vector with lower bounds of stairs
uBounds = jumps(2:end); % Vector with upper bounds of stairs
middle = (lBounds+uBounds)/2; % center of each stair
fStep = f(middle); % height of the stairs
intersection = fStep; % Solution of linear function q=fStep
% Check if intersection is within the bounds of the specific step
solutions = intersection(intersection>=lBounds & intersection<uBounds)
2.3000 6.9000

Detect steps in a Piecewise constant signal

I have a piecewise constant signal shown below. I want to detect the location of step transition (Marked in red).
My current approach:
Smooth signal using moving average filter (http://www.mathworks.com/help/signal/examples/signal-smoothing.html)
Perform Discrete Wavelet transform to get discontinuities
Locate the discontinuities to get the location of step transition
I am currently implementing the last step of detecting the discontinuities. However, I cannot get the precise location and end with many false detection.
My question:
Is this the correct approach?
If yes, can someone shed some info/ algorithm to use for the last step?
Please suggest an alternate/ better approach.
Thanks
Convolve your signal with a 1st derivative of a Gaussian to find the step positions, similar to a Canny edge detection in 1-D. You can do that in a multi-scale approach, starting from a "large" sigma (say ~10 pixels) detect local maxima, then to a smaller sigma (~2 pixels) to converge on the right pixels where the steps are.
You can see an implementation of this approach here.
If your function is really piecewise constant, why not use just abs of diff compared to a threshold?
th = 0.1;
x_steps = x(abs(diff(y)) > th)
where x a vector with your x-axis values, y is your y-axis data, and th is a threshold.
Example:
>> x = [2 3 4 5 6 7 8 9];
>> y = [1 1 1 2 2 2 3 3];
>> th = 0.1;
>> x_steps = x(abs(diff(y)) > th)
x_steps =
4 7
Regarding your point 3: (Please suggest an alternate/ better approach)
I suggest to use a Potts "filter". This is a variational approach to get an accurate estimation of your piecewise constant signal (similar to the total variation minimization). It can be interpreted as adaptive median filtering. Given the Potts estimate u, the jump points are the points of non-zero gradient of u, that is, diff(u) ~= 0. (There are free Matlab implementations of the Potts filters on the web)
See also http://en.wikipedia.org/wiki/Step_detection
Total Variation Denoising can produce a piecewise constant signal. Then, as pointed out above, "abs of diff compared to a threshold" returns the position of the transitions.
There exist very efficient algorithms for TVDN that process millions of data points within milliseconds:
http://www.gipsa-lab.grenoble-inp.fr/~laurent.condat/download/condat_fast_tv.c
Here's an implementation of a variational approach with python and matlab interface that also uses TVDN:
https://github.com/qubit-ulm/ebs
I think, smoothing with a sharper lowpass filter should work better.
Try to use medfilt1() (a median filter) instead, since you have very concrete levels. If you know how long your plateau is, you can take half/quarter of the plateau length for example. Then you would get very sharp edges. The sharp edges should be detectable using a Haar wavelet or even just using simple differentiation.

Minimization of L1-Regularized system, converging on non-minimum location?

This is my first post to stackoverflow, so if this isn't the correct area I apologize. I am working on minimizing a L1-Regularized System.
This weekend is my first dive into optimization, I have a basic linear system Y = X*B, X is an n-by-p matrix, B is a p-by-1 vector of model coefficients and Y is a n-by-1 output vector.
I am trying to find the model coefficients, I have implemented both gradient descent and coordinate descent algorithms to minimize the L1 Regularized system. To find my step size I am using the backtracking algorithm, I terminate the algorithm by looking at the norm-2 of the gradient and terminating if it is 'close enough' to zero(for now I'm using 0.001).
The function I am trying to minimize is the following (0.5)*(norm((Y - X*B),2)^2) + lambda*norm(B,1). (Note: By norm(Y,2) I mean the norm-2 value of the vector Y) My X matrix is 150-by-5 and is not sparse.
If I set the regularization parameter lambda to zero I should converge on the least squares solution, I can verify that both my algorithms do this pretty well and fairly quickly.
If I start to increase lambda my model coefficients all tend towards zero, this is what I expect, my algorithms never terminate though because the norm-2 of the gradient is always positive number. For example, a lambda of 1000 will give me coefficients in the 10^(-19) range but the norm2 of my gradient is ~1.5, this is after several thousand iterations, While my gradient values all converge to something in the 0 to 1 range, my step size becomes extremely small (10^(-37) range). If I let the algorithm run for longer the situation does not improve, it appears to have gotten stuck somehow.
Both my gradient and coordinate descent algorithms converge on the same point and give the same norm2(gradient) number for the termination condition. They also work quite well with lambda of 0. If I use a very small lambda(say 0.001) I get convergence, a lambda of 0.1 looks like it would converge if I ran it for an hour or two, a lambda any greater and the convergence rate is so small it's useless.
I had a few questions that I think might relate to the problem?
In calculating the gradient I am using a finite difference method (f(x+h) - f(x-h))/(2h)) with an h of 10^(-5). Any thoughts on this value of h?
Another thought was that at these very tiny steps it is traveling back and forth in a direction nearly orthogonal to the minimum, making the convergence rate so slow it is useless.
My last thought was that perhaps I should be using a different termination method, perhaps looking at the rate of convergence, if the convergence rate is extremely slow then terminate. Is this a common termination method?
The 1-norm isn't differentiable. This will cause fundamental problems with a lot of things, notably the termination test you chose; the gradient will change drastically around your minimum and fail to exist on a set of measure zero.
The termination test you really want will be along the lines of "there is a very short vector in the subgradient."
It is fairly easy to find the shortest vector in the subgradient of ||Ax-b||_2^2 + lambda ||x||_1. Choose, wisely, a tolerance eps and do the following steps:
Compute v = grad(||Ax-b||_2^2).
If x[i] < -eps, then subtract lambda from v[i]. If x[i] > eps, then add lambda to v[i]. If -eps <= x[i] <= eps, then add the number in [-lambda, lambda] to v[i] that minimises v[i].
You can do your termination test here, treating v as the gradient. I'd also recommend using v for the gradient when choosing where your next iterate should be.