Related
I have an image similar to this one:
and want to remove its underlying baseline so that it looks like:
The image is always different, usually has some peaks and has a total absolute offset and a base surface that is tilted/curved/nonlinear.
I was thinking of a using the 1D baseline fitting and subtraction technique for common signal spectra and create a 2D baseline image and then numerically subtract each from another. But can't quite get my head around it in 2D.
This is an improved question I asked before but this one should be more clear.
It seems to me that we can apply some kind of high pass filter to sovle your problem. One way to do so is using a blurring filter (some kind of low pass filter), and subtract the blurred part from the original (known as "unsharp masking"). So for lowpass filtering you could use a convolutionw with a gaussian or just a plain box filter. Alternatively you could also use a median filter which is what I did here:
%% setup
v = 0:0.01:1;
[x,y] = meshgrid(v, v);
z0 = cos(pi*x).*cos(pi*y);z = z0; % "baseline" surface
pks = [1,1; 3,3; 7,5; 2,8; 7, 3]/10;% add 5 peaks
for p=pks';
z = z + exp(-((x-p(1)).^2 + (y-p(2)).^2)/0.02.^2);
end
subplot(221);mesh(x,y,z);title('data');
%% recover "baseline"
z0_ = medfilt2(z, [1,1]*20, 'symmetric'); % low pass filter of your choice
subplot(222);mesh(x,y,z0_);title('recovered baseline');
subplot(223);mesh(x,y,z0_-z0);title('error');
%% subtract recovered baseline
subplot(224);mesh(x,y,z-z0_);title('recovered baseline removed');
Previous answers have suggested interesting mathematical methods for removing the baseline. But I guess this question is a continuation of your previous questions, and by "image" you mean that your data is really an image. If so, you can use image processing techniques to find the peaks and flatten the areas around them.
1. Preprocessing
Before applying different filters, it is better to map the pixel values to a certain range. this way we can have better control over the values of the required parameters of the filters.
First we convert the image data type to double, for cases when the pixel values are integers.
I = double(I);
Then, by applying the average filter, we reduce the noise in the image.
SI = imfilter(I,fspecial('disk',40),'replicate');
Finally, we map the values of all the pixels to the range of zero to one.
NI = SI-min(SI(:));
NI = NI/max(NI(:));
2. Segmentation
After preparing the image, we can identify the parts where each of the peaks is located. To do this, we first calculate the image gradient.
G = imgradient(NI,'sobel');
Then we identify the parts of the image that have a higher slope. Since "high slope" may have different meanings in different images, we use the graythresh function to divide the image into two parts, low slope and high slope.
SA = im2bw(G, graythresh(G));
The segmented areas in the previous step can have several problems:
Small continuous components, which are categorized as part of high slope area, may be caused only by noise. Therefore, components with an area less than a threshold value should be removed.
Due to the fact that the slope reaches zero at the top of the peaks, there will likely be holes in the components found in the previous step.
The slope of the peak is not necessarily the same along its boundaries, and the found areas can have irregular shapes. One solution could be to expand them by replacing them with their Convex Halls.
[L, nPeaks] = bwlabel(SA);
minArea = 0.03*numel(I);
P = false(size(I));
for i=1:nPeaks
P_i = bwconvhull(L==i);
area = sum(P_i(:));
if (area>minArea)
P = P|P_i;
end
end
3. Removing Baseline
The P matrix, calculated in the previous step, contains the value of one at the peaks and zero at the other areas. So far, we can delete the base line by multiplying this matrix in the main image. But it is better to first soften the edges of the found areas so that the edges of the peaks do not suddenly fall to zero.
FC = imfilter(double(P),fspecial('disk',50),'replicate');
F = I.*FC;
You can also shift peaks with the least amount of image at their edges.
E = bwmorph(P, 'remove');
o = min(I(E));
T = max(0, F-o);
All the above steps in one function
function [hlink, T] = removeBaseline(I, demoSteps)
% converting image to double
I = double(I);
% smoothing image to reduce noise
SI = imfilter(I,fspecial('disk',40),'replicate');
% normalizing image in [0..1] range
NI = SI-min(SI(:));
NI = NI/max(NI(:));
% computng image gradient
G = imgradient(NI,'sobel');
% finding steep areas of the image
SA = im2bw(G, graythresh(G));
% segmenting image to find regions covered by each peak
[L, nPeaks] = bwlabel(SA);
% defining a threshold for minimum area covered by each peak
minArea = 0.03*numel(I);
% filling each of the regions, and eliminating small ones
P = false(size(I));
for i=1:nPeaks
% finding convex hull of the region
P_i = bwconvhull(L==i);
% computing area of the filled region
area = sum(P_i(:));
if (area>minArea)
% adding the region to peaks mask
P = P|P_i;
end
end
% applying the average filter on peaks mask to compute coefficients
FC = imfilter(double(P),fspecial('disk',50),'replicate');
% Removing baseline by multiplying the coefficients
F = I.*FC;
% finding edge of peaks
E = bwmorph(P, 'remove');
% finding minimum value of edges in the image
o = min(I(E));
% shifting the flattened image
T = max(0, F-o);
if demoSteps
figure
subplot 231, imshow(I, []); title('Original Image');
subplot 232, imshow(SI, []); title('Smoothed Image');
subplot 233, imshow(NI); title('Normalized in [0..1]');
subplot 234, imshow(G, []); title('Gradient of Image');
subplot 235, imshow(SA); title('Steep Areas');
subplot 236, imshow(P); title('Peaks');
figure;
subplot 221, imshow(FC); title('Flattening Coefficients');
subplot 222, imshow(F, []); title('Base Line Removed');
subplot 223, imshow(E); title('Peak Edge');
subplot 224, imshow(T, []); title('Final Result');
figure
h1 = subplot(1, 3, 1);
surf(I, 'edgecolor', 'none'); hold on;
contour3(I, 'k', 'levellist', o, 'linewidth', 2)
h2 = subplot(1, 3, 2);
surf(F, 'edgecolor', 'none'); hold on;
contour3(F, 'k', 'levellist', o, 'linewidth', 2)
h3 = subplot(1, 3, 3);
surf(T, 'edgecolor', 'none');
hlink = linkprop([h1 h2 h3],{'CameraPosition','CameraUpVector', 'xlim', 'ylim', 'zlim', 'clim'});
set(h1, 'zlim', [0 max(I(:))])
set(h1, 'ylim', [0 size(I, 1)])
set(h1, 'xlim', [0 size(I, 2)])
set(h1, 'clim', [0 max(I(:))])
end
end
To execute the function with an image containing several peaks with noise:
close all; clc; clear variables;
I = abs(peaks(1200));
J1 = imnoise(ones(size(I))*0.5,'salt & pepper', 0.05);
J1 = imfilter(double(J1),fspecial('disk',20),'replicate');
[X, Y] = meshgrid(linspace(0, 1, size(I, 2)), linspace(0, 1, size(I, 1)));
J2 = X.^3+Y.^2;
I = max(I, 2*J2) + 5*J1;
lp3 = removeBaseline(I, true);
To call the function for an image read from file:
I = rgb2gray(imread('imagefile.jpg'));
[~, I2] = removeBaseline(I, true);
Results for images provided in previous questions:
I have a solution in Python, but guess it would not be to complicated to transfer this to MATLAB.
It works with a bunch of peaks. I made a few assumptions, though, like that there are several peaks. It works with one, but is better if there are a few peaks. Peaks may overlap. The main assumption is of course the shape of the background, but this can be modified if other models exist.
The main idea is to subtract a function but forbidding negative values. This is done via an extra cost function, which I keep differentiable for the sake of minimization. As a consequence, there might be issues for values near zero. Such cases can be handled by iterating on how sharp the extra cost comes in. One would start with a moderate slope of about one and re-do the fit with a steeper slope and starting values from the previous fit. I've done that on similar problems and it works ok. Technically, it is not excluded that there are small negative values after subtracting the fit-solution, so for image data an extra step would be necessary to take care of that.
Here is the code
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
from scipy.optimize import least_squares
def peak( x,y, a, x0, y0, s):
"""
Just a symmetric peak for testing
"""
return a * np.exp( -( (x - x0 )**2 + ( y - y0 )**2 ) / 2 / s**2 )
def second_order( xx, yy, aa, bb, cc, dd, ee, ff ):
"""
Assuming that the base can be approximated by a second order equation
generalization to higher orders should be straight forward
"""
out = aa * xx**2 + 2 * bb * xx * yy + cc * yy**2 + dd * xx + ee * yy + ff
return out
def residual_function( params, xa, ya, za, extracost, slope ):
"""
cost function. Calculates difference from zero-plane
with ultra high cost for negative values.
previous solutions to similar problems have shown that sometimes
the optimization process has to be iterated with increasing
parameter slope ( and maybe extracost )
"""
aa, bb, cc, dd, ee, ff = params
###subtract such that values become as small as possible
###
diffarray = za - second_order( xa, ya, aa, bb, cc, dd, ee, ff )
diffarray = diffarray.flatten( )
### BUT high costs for negative values
cost = np.fromiter( ( -extracost * ( np.tanh( slope * x ) - 1 ) / 2.0 for x in diffarray ), np.float )
return np.abs( cost ) + np.abs( diffarray )
### some test data
xl = np.linspace( -3, 5, 50 )
yl = np.linspace( -1, 7, 60 )
XX, YY = np.meshgrid( xl, yl )
VV = second_order( XX, YY, 0.1, 0.2, 0.08, 0.28, 1.9, 1.3 )
VV = VV + peak( XX, YY, 65, 1, 2, 0.3 )
# ~VV = VV + peak( XX, YY, 55, 3, 4, 0.5 )
# ~VV = VV + peak( XX, YY, 55, -1, 0, 0.4 )
# ~VV = VV + peak( XX, YY, 55, -3, 6, 0.7 )
### the optimization
result = least_squares(residual_function, x0=( 0.0, 0.0, 0.00, 0.0, 0.0, 0 ), args=( XX, YY, VV, 1e4, 50 ) )
print result
print result.x
subtractme = second_order( XX, YY, *(result.x) )
nobase = VV - subtractme
### plotting
fig = plt.figure()
ax = fig.add_subplot( 1, 2, 1, projection='3d' )
ax.plot_surface( XX, YY, VV)
bx = fig.add_subplot( 1, 2, 2, projection='3d' )
bx.plot_surface( XX, YY, nobase)
plt.show()
provides
<< [0.092135 0.18974991 0.06144773 0.37054049 2.05096262 0.88314363]
and
I want to add a known error bar (vertical) for each x data for the gscatter function. I have plotted the grouped scatter (to specify the colour) with calculated mean. How should I do?
This is my current code
Mydata = readable ('D:\Download\Book1.xlsv);
y = Mydata.Y;
x = Mydata.X;
g = Mydata.Category
size = 10
h = gcatter (x,y,g,'rkgb','X',size);
I don't think that Matlab's scatterplot supports errorbars from within the scatter function itself. I think that some more manual work should be done. Here is a working example with 2 categories, made simple with a loop (you can apply this to much more than just 2 categories)
Y = [4,3,4,2,10,9,11]; % some invented Y data
X = [1,2,3,7,6,9,8]; % some invented X data
groups = [0, 1]; % 2 groups/categories
G = [0,0,0,1,1,1,1]; % categories of data
E = [0.1, 0.4, 0.2, 0.5, 0.9, 0.7, 1]; % errors
colors = {'r', 'k'};
figure, gscatter (X,Y,G,'rk','X',10);
hold on
for i = 1:length(groups)
errorbar(X(G==groups(i)),Y(G==groups(i)),E(G==groups(i)),'LineStyle','None','Color',colors{i})
end
I have data that y and x don't have a linear trend. The data as follows and if you plot y as a function of x, the plot is nonlinear.
x= [45.5976, 45.6311, 45.6599, 45.679, 45.703, 45.7461, 45.7749]
y = [0.17, 1.7, 5.1, 17, 51, 170, 510]
plot(x,y,'o')
My goal is to find an optimum value of b to make log(y) behavior with respect to log((x-b)/b) a linear relation. In other words, plot(log((x-b)/b),log(y) should produce a linear function.
Since I don't have enough reputation to add a comment to clarify the question, I'm trying to help in an answer. Also, typically when transforming data to fit a linear regression, if your original model is: y = b0 + b1x, then taking logs of both the predictor and response gives a new model y* = b0 + b1x* where y* = ln(y) and x* = ln(x). Why did you decide your model should be of the form: ln(y) = ln((x-b)/b)?
In any case, to find the optimal beta values for such a model in Matlab you would do something like the following:
x= [45.5976, 45.6311, 45.6599, 45.679, 45.703, 45.7461, 45.7749]';
y = [0.17, 1.7, 5.1, 17, 51, 170, 510]';
figure(1), plot(x,y,'o');
ln_y = log(y);
ln_x = log(x);
figure(2), plot(ln_x, ln_y, 'x');
ln_X = [ones(length(ln_x),1) ln_x];
B = ln_X\ln_y;
ln_y_fitted = ln_X*B;
figure(2),
hold on
plot(ln_x, ln_y_fitted, '--', 'Color', 'r');
Given the above code, if you want to plot the various results for log(y) = log((x-b)/b), you can use something like this:
for b = 0.1:0.1:4
ln_x = log((x-b)/b);
figure, plot(ln_x, ln_y, 'x');
end
In the below posted image, I am trying to get TFR using STFT. In the code posted, I specified the paramerter T = 0:.001:1; and when I modify it to be, for an example, T = 0:.001:2; the values range on the horizontal axis of the plot changes, despite it is labelled Frequency.
Now, I want to change the ranges of values of the horizontal and the vertical axes on the shown plot. How can I do that?
NOTE: the code used to generate the shown plot is:
T = 0:.001:1;
spectrogram(x4,128,50,NFFT);
CODE:
% Time specifications:
Fs = 8000; % samples per second
dt = 1/Fs; % seconds per sample
StopTime = 1; % seconds
t = (0:dt:StopTime-dt); % seconds
t1 = (0:dt:.25);
t2 = (.25:dt:.50);
t3 = (.5:dt:.75);
t4 = (.75:dt:1);
%two freqs. = abs(f1 - f2), that's why, x1 is plotted with 2 freqs.
x1 = (10)*sin(2*pi*30*t1);
x2 = (10)*sin(2*pi*60*t2) + x1;
x3 = (10)*sin(2*pi*90*t3) + x2;
x4 = (10)*sin(2*pi*120*t4) + x3;
%x5 = (10) * sin(2*pi*5*t5);
%x6 = x1 + x2 + x3 + x4 + x5;
NFFT = 2 ^ nextpow2(length(t)); % Next power of 2 from length of y
Y = fft(x3, NFFT);
f = Fs / 2 * linspace(0, 1, NFFT/2 + 1);
figure;
plot(f(1:200), 2 * abs( Y( 1:200) ) );
T = 0:.001:1;
spectrogram(x4,10,9,31);
axis(get(gcf,'children'), [0, 1,0,100]);
% Plot the signal versus time:
figure;
xlabel('time (in seconds)');
ylabel('Amplitude');
title('non-stationary Signal versus Time');
hold on
plot(t1,x1,'r');
plot(t2,x2,'g');
plot(t3,x3,'b');
plot(t4,x4,'black');
%plot(t5,x5,'y');
%plot(t, x6,'black');
legend('x1 = (10)*sin(2*pi*15*t1) + (10)*sin(2*pi*8*t1)', 'x2 = (10)*sin(2*pi*25*t2) + x1',
'x3 = (10)*sin(2*pi*50*t3) + x2', 'x4 = (10)*sin(2*pi*75*t4) + x3', ...
'Location', 'SouthWest');
image
new Result_1
Idea: get the axis the was used to plot the spectrogram and set its properties accordingly. For example, supposing you'd want to restrict the x range to [0, 0.5] and y to [100, 200], then:
%'old code here'
%' . . . '
spectrogram(x4,128,50,NFFT);
%'new code here'
axis(get(gcf,'children'), [0, 0.5, 100, 200]);
Explanation: The added one-liner gets the child handle from the current figure gcf (wich is assumed to be created by spectrogram), then sets it's range to [xmin, xmax, ymin, ymax] via axis call.
Nota Bene: I assume that you just need to re-scale the axis, not re-compute the spectrogram for different data.
Also I assume the spectrogram doesn't share its figure with other axes.
Also, extending the axis range rather than restricting it might not give you the expected results (in a word: is ugly).
Please find the data in the link below, or if you can send me your private email, I can send you the data
https://dl.dropboxusercontent.com/u/5353938/test_matlab_lefou.xlsx
In the excel sheet, the first column is y, the second is x and the third is t, I hope this will make things much more clear, and many thanks for the help.
I need to use the following model because it is the one that fits best my data, but what I don't know is how to find the best values of a and b, that will allow me to get the best fit, (I can attach a file if you need the values), I already have the values of y, x and t:
y= a*sqrt(x).exp(b.t)
Thanks
Without the dependency on the curve fitting toolbox, this problem can also be solved by using fminsearch. I first generate some data, which you already have but didn't share with us. An initial guess on the parameters a and b must be made (p0). Then I do the optimiziation by minizmizing the squared errors between data and fit resulting in the vector p_fit, which contains the optimized parameters for a and b. In the end, the result is visualized.
% ----- Generating some data for x, y and t (which you already got)
N = 10; % num of data points
x = linspace(0,5,N);
t = linspace(0,10,N);
% random parameters
a = rand()*5; % a between 0 and 5
b = (rand()-1); % b between -1 and 0
y = a*sqrt(x).*exp(b*t) + rand(size(x))*0.1; % noisy data
% ----- YOU START HERE WITH YOUR PROBLEM -----
% put x and t into a 2 row matrix for simplicity
D(1,:) = x;
D(2,:) = t;
% create model function with parameters p(1) = a and p(2) = b
model = #(p, D) p(1)*sqrt(D(1,:)).*exp(p(2)*D(2,:));
e = #(p) sum((y - model(p,D)).^2); % minimize squared errors
p0 = [1,-1]; % an initial guess (positive a and probably negative b for a decay)
[p_fit, r1] = fminsearch(e, p0); % Optimize
% ----- VISUALIZATION ----
figure
plot(x,y,'ko')
hold on
X = linspace(min(x), max(x), 100);
T = linspace(min(t), max(t), 100);
plot(X, model(p_fit, [X; T]), 'r--')
legend('data', sprintf('fit: y(t,x) = %.2f*sqrt(x)*exp(%.2f*t)', p_fit))
The result can look like
UPDATE AFTER MANY MANY COMMENTS
Your data are column vectors, my solution used row vectors. The error occured when the errorfunction tryed to compute the difference of a column vector (y) and a row-vector (result of the model-function). Easy hack: make them all to row vectors and use my approach. The result is: a = 0.5296 and b = 0.0013.
However, the Optimization depends on the initial guess p0, you might want to play around with it a little bit.
clear variables
load matlab.mat
% put x and t into a 2 row matrix for simplicity
D(1,:) = x;
D(2,:) = t;
y = reshape(y, 1, length(y)); % <-- also y is a row vector, now
% create model function with parameters p(1) = a and p(2) = b
model = #(p, D) p(1)*sqrt(D(1,:)).*exp(p(2)*D(2,:));
e = #(p) sum((y - model(p,D)).^2); % minimize squared errors
p0 = [1,0]; % an initial guess (positive a and probably negative b for a decay)
[p_fit, r1] = fminsearch(e, p0); % Optimize
% p_fit = nlinfit(D, y, model, p0) % as a working alternative with dependency on the statistics toolbox
% ----- VISUALIZATION ----
figure
plot(x,y,'ko', 'markerfacecolor', 'black', 'markersize',5)
hold on
X = linspace(min(x), max(x), 100);
T = linspace(min(t), max(t), 100);
plot(X, model(p_fit, [X; T]), 'r-', 'linewidth', 2)
legend('data', sprintf('fit: y(t,x) = %.2f*sqrt(x)*exp(%.2f*t)', p_fit))
The result doesn't look too satisfying though. But that mainly is because of your data. Have a look here:
With the cftool-command (curve fitting toolbox) you can fit to your own functions, returning the variables that you need (a,b). Make sure your x-data and y-data are in separate variables. you can also specify weights for your measurements.