Representing an exponential function with Fourier series - matlab

I am trying to represent a double exponential function with Fourier series in Matlab, but without the dc component. The basic function has the form K*(exp(-t.*alpha) - exp(-t.*beta)) and the expected format should be like this double exponential with the time domain function. I used the Heaviside function to make the rectangular window and invert the waveform.
clear, clc
Tr = 1.5E-6; Tf = 50E-6;
al=1/Tf; be=1/Tr; % alpha and beta
f0 = 1/(8*Tf);
T = 1/f0; % fundamental period
K = 1; % Amplitude
t=0:Tr/4:T; % Time
w0 = 2*pi/T; % Fundamental freq (in rad/s)
uV = 1E-6; % 1 uV = 1E-6 V
%% Double exponential - TD continuous
DEXP_TD_parameters = K* (exp(-t.*al) - exp(-t.*be));
DEXP_TD = K* (exp(-t.*al) - exp(-t.*be)).* (heaviside(t)-heaviside(t-T/2)) ...
- K* (exp(-(t-T/2).*al) - exp(-(t-T/2).*be)).*(heaviside(t-T/2)-heaviside(t-T));
Then, I calculated the ck of the series and plotted the time domain function again.
%% DEXP - Fourier coefficient c_k
h = 150; % Number of harmonics
for m = 1:h
if mod(m,2) == 0, ck(m) = NaN;
ck(m) = (K/pi) * ( (1+exp(-al*pi/w0)) / (1j*m+al/w0) ...
-(1+exp(-be*pi/w0)) / (1j*m+be/w0) );
ck_n(h-m+1) = (K/pi) * ( (1+exp(-al*pi/w0)) / (1j*(-m)+al/w0) ...
-(1+exp(-be*pi/w0)) / (1j*(-m)+be/w0) );
ck_n(ck_n == 0) = NaN;
%% FD/TD Reconstruction
DEXP_TD_rec = zeros(1,(length(t)));
summ = zeros(1,h);
if mod(h,2) == 0
else hk=h;
for m = 1:length(t)
for k = 1:2:h(end)
% DEXP_sum(k) = (ck(k)) * (exp(1j*w0*k*t(m)) + exp(-1j*w0*k*t(m)));
DEXP_sum(k) = 2*abs(ck(k)) * (cos(k*w0.*t(m) + angle(ck(k))));
DEXP_TD_rec(m) = sum(DEXP_sum);
%% Plots
figure(1) % TD
plot(t/uV, DEXP_TD,'r', t/uV, abs(DEXP_TD_rec),'--b')
The time domain Fourier series waveform (in blue) does not match the red curve, and I wonder why. The shape is okay, but the second half of the curve should be negative. Any tips on how to solve it?


How do I find local threshold for coefficients in image compression using DWT in MATLAB

I'm trying to write an image compression script in MATLAB using multilayer 3D DWT(color image). along the way, I want to apply thresholding on coefficient matrices, both global and local thresholds.
I like to use the formula below to calculate my local threshold:
where sigma is variance and N is the number of elements.
Global thresholding works fine; but my problem is that the calculated local threshold is (most often!) greater than the maximum band coefficient, therefore no thresholding is applied.
Everything else works fine and I get a result too, but I suspect the local threshold is miscalculated. Also, the resulting image is larger than the original!
I'd appreciate any help on the correct way to calculate the local threshold, or if there's a pre-set MATLAB function.
here's an example output:
here's my code:
%%%%% COMPRESSION %%%%%
% read base image
% dwt 3/5-L on base images
% quantize coeffs (local/global)
% count zero value-ed coeffs
% calculate mse/psnr
% save and show result
% read images
base = imread('circ.jpg');
fam = 'haar'; % wavelet family
lvl = 3; % wavelet depth
% set to 1 to apply global thr
thr_type = 0;
% global threshold value
gthr = 180;
% convert base to grayscale
%base = rgb2gray(base);
% apply dwt on base image
dc = wavedec3(base, lvl, fam);
% extract coeffs
ll_base = dc.dec{1};
lh_base = dc.dec{2};
hl_base = dc.dec{3};
hh_base = dc.dec{4};
ll_var = var(ll_base, 0);
lh_var = var(lh_base, 0);
hl_var = var(hl_base, 0);
hh_var = var(hh_base, 0);
% count number of elements
ll_n = numel(ll_base);
lh_n = numel(lh_base);
hl_n = numel(hl_base);
hh_n = numel(hh_base);
% find local threshold
ll_t = ll_var * (sqrt(2 * log2(ll_n)));
lh_t = lh_var * (sqrt(2 * log2(lh_n)));
hl_t = hl_var * (sqrt(2 * log2(hl_n)));
hh_t = hh_var * (sqrt(2 * log2(hh_n)));
% global
if thr_type == 1
ll_t = gthr; lh_t = gthr; hl_t = gthr; hh_t = gthr;
% count zero values in bands
ll_size = size(ll_base);
lh_size = size(lh_base);
hl_size = size(hl_base);
hh_size = size(hh_base);
% count zero values in new band matrices
ll_zeros = sum(ll_base==0,'all');
lh_zeros = sum(lh_base==0,'all');
hl_zeros = sum(hl_base==0,'all');
hh_zeros = sum(hh_base==0,'all');
% initiate new matrices
ll_new = zeros(ll_size);
lh_new = zeros(lh_size);
hl_new = zeros(lh_size);
hh_new = zeros(lh_size);
% apply thresholding on bands
% if new value < thr => 0
% otherwise, keep the previous value
for id=1:ll_size(1)
for idx=1:ll_size(2)
if ll_base(id,idx) < ll_t
ll_new(id,idx) = 0;
ll_new(id,idx) = ll_base(id,idx);
for id=1:lh_size(1)
for idx=1:lh_size(2)
if lh_base(id,idx) < lh_t
lh_new(id,idx) = 0;
lh_new(id,idx) = lh_base(id,idx);
for id=1:hl_size(1)
for idx=1:hl_size(2)
if hl_base(id,idx) < hl_t
hl_new(id,idx) = 0;
hl_new(id,idx) = hl_base(id,idx);
for id=1:hh_size(1)
for idx=1:hh_size(2)
if hh_base(id,idx) < hh_t
hh_new(id,idx) = 0;
hh_new(id,idx) = hh_base(id,idx);
% count zeros of the new matrices
ll_new_size = size(ll_new);
lh_new_size = size(lh_new);
hl_new_size = size(hl_new);
hh_new_size = size(hh_new);
% count number of zeros among new values
ll_new_zeros = sum(ll_new==0,'all');
lh_new_zeros = sum(lh_new==0,'all');
hl_new_zeros = sum(hl_new==0,'all');
hh_new_zeros = sum(hh_new==0,'all');
% set new band matrices
dc.dec{1} = ll_new;
dc.dec{2} = lh_new;
dc.dec{3} = hl_new;
dc.dec{4} = hh_new;
% count how many coeff. were thresholded
ll_zeros_diff = ll_new_zeros - ll_zeros;
lh_zeros_diff = lh_zeros - lh_new_zeros;
hl_zeros_diff = hl_zeros - hl_new_zeros;
hh_zeros_diff = hh_zeros - hh_new_zeros;
% show coeff. matrices vs. thresholded version
subplot(2,4,1); imagesc(ll_base); title('LL');
subplot(2,4,2); imagesc(lh_base); title('LH');
subplot(2,4,3); imagesc(hl_base); title('HL');
subplot(2,4,4); imagesc(hh_base); title('HH');
subplot(2,4,5); imagesc(ll_new); title({'LL thr';ll_zeros_diff});
subplot(2,4,6); imagesc(lh_new); title({'LH thr';lh_zeros_diff});
subplot(2,4,7); imagesc(hl_new); title({'HL thr';hl_zeros_diff});
subplot(2,4,8); imagesc(hh_new); title({'HH thr';hh_zeros_diff});
% idwt to reconstruct compressed image
cmp = waverec3(dc);
cmp = uint8(cmp);
% calculate mse/psnr
D = abs(cmp - base) .^2;
mse = sum(D(:))/numel(base);
psnr = 10*log10(255*255/mse);
% show images and mse/psnr
imshow(base); title("Original"); axis square;
imshow(cmp); colormap(gray); axis square;
msg = strcat("MSE: ", num2str(mse), " | PSNR: ", num2str(psnr));
% save image locally
imwrite(cmp, 'compressed.png');
I solved the question.
the sigma in the local threshold formula is not variance, it's the standard deviation. I applied these steps:
used stdfilt() std2() to find standard deviation of my coeff. matrices (thanks to #Rotem for pointing this out)
used numel() to count the number of elements in coeff. matrices
this is a summary of the process. it's the same for other bands (LH, HL, HH))
[c, s] = wavedec2(image, wname, level); %apply dwt
ll = appcoeff2(c, s, wname); %find LL
ll_std = std2(ll); %find standard deviation
ll_n = numel(ll); %find number of coeffs in LL
ll_t = ll_std * (sqrt(2 * log2(ll_n))); %local the formula
ll_new = ll .* double(ll > ll_t); %thresholding
replace the LL values in c in a for loop
reconstruct by applying IDWT using waverec2
this is a sample output:

Finding Percent Error of a Fourier Series

Find the error as a function of n, where the error is defined as the difference between two the voltage from the Fourier series (vF (t)) and the value from the ideal function (v(t)), normalized to the maximum magnitude (Vm ):
I am given this prompt where Vm = 1 V. Below this line is the code which I have written.
I am trying to write a function to solve this question: Plot the error versus time for n=3,n=5,n=10, and n=50. (10points). What does it look like I am doing incorrectly?
close all;
clear all;
% define the signal parameters
Vm = 1;
T = 1;
w0 = 2*pi/T;
% define the symbolic variables
syms n t;
% define the signal
v1 = Vm*sin(4*pi*t/T);
v2 = 2*Vm*sin(4*pi*t/T);
% evaluate the fourier series integral
an1 = 2/T*int(v1*cos(n*w0*t),0,T/2) + 2/T*int(v2*cos(n*w0*t),T/2,T);
bn1 = 2/T*int(v1*sin(n*w0*t),0,T/2) + 2/T*int(v2*sin(n*w0*t),T/2,T);
a0 = 1/T*int(v1,0,T/2) + 1/T*int(v2,T/2,T);
% obtain C by substituting n in c[n]
nmax = 100;
n = 1:nmax;
a = subs(an1);
b = subs(bn1);
% define the time vector
ts = 1e-2; % ts is sampling the
t = 0:ts:3*T-ts;
% directly plot the signal x(t)
t1 = 0:ts:T-ts;
v1 = Vm*sin(4*pi*t1/T).*(t1<=T/2);
v2 = 2*Vm*sin(4*pi*t1/T).*(t1>T/2).*(t1<T);
v = v1+v2;
x = repmat(v,1,3);
% Now fourier series reconstruction
N = [3];
for p = 1:length(N)
for i = 1:length(t)
for k = N(p)
x(k,i) = a(k)*cos(k*w0*t(i)) + b(k)*sin(k*w0*t(i));
% y(k,i) = a0+sum(x(:,i)); % Add DC term
z = a0 + sum(x);
%Percent error
function [per_error] = percent_error(measured, actual)
per_error = abs(( (measured - actual) ./ 1) * 100);
Matlab ode45 new variable

I have a Matlab code that simulates frisbee flight dynamics. I would like to add a wind variable. I did it, but after seeing the plots I think my wind is reducing the speed of the disc. I mean it should change the speed of the disc but via lift and drag force, now it looks like wind speed variable direcly changes disc speed variable. What I want is to affect only the lift and drag forces with wind, but I can't make it work. Here is my current code that is not working. This is an external M-file which is used by the ode45 function in the main script:
function xdot=discfltEOM(t,x,CoefUsed)
% Equations of Motion for the frisbee
% The inertial frame, xyz = forward, right and down
global m g Ia Id A d rho
global CLo CLa CDo CDa CMo CMa CRr
global CL_data CD_data CM_data CRr_rad CRr_AdvR CRr_data
global CMq CRp CNr
% x = [ x y z vx vy vz f th fd thd gd gamma Wx Wy]
% 1 2 3 4 5 6 7 8 9 10 11 12 13 14
%% give states normal names
vx = x(4);
vy = x(5);
vz = x(6);
f = x(7);
th = x(8);
st = sin(th);
ct = cos(th);
sf = sin(f);
cf = cos(f);
fd = x(9);
thd= x(10);
gd = x(11);
Wx = x(13);
Wy = x(14);
%% Define transformation matrix
%% [c]=[T_c_N] * [N]
T_c_N=[ct st*sf -st*cf; 0 cf sf; st -ct*sf ct*cf];
%% [d]=[T_d_N] * [N]
%T_d_N(1,:)=[cg*ct sg*cf+sf*st*cg sf*sg-st*cf*cg];
%T_d_N(2,:)=[ -sg*ct cf*cg-sf*sg*st sf*cg+sg*st*cf];
%T_d_N(3,:)=[ st -sf*ct cf*ct]
c1=T_c_N(1,:); % c1 expressed in N frame
c2=T_c_N(2,:); % c2 expressed in N frame
c3=T_c_N(3,:); % c3 expressed in N frame
%% calculate aerodynamic forces and moments
%% every vector is expressed in the N frame
vel = [vx vy vz]; %expressed in N
vmag = norm(vel);
Vwiatr = [Wx Wy 0];
Vw = norm(Vwiatr);
vc3=dot(vel,c3); % velocity (scalar) in the c3 direction
vp= [vel-vc3*c3]; % subtract the c3 velocity component to get the velocity vector
% projected onto the plane of the disc, expressed in N
alpha = atan(vc3/norm(vp));
Adp = A*rho*(vmag-Vw)*(vmag-Vw)/2;
uvel = vel/vmag; % unit vector in vel direction, expressed in N
uvp = vp/norm(vp); % unit vector in the projected velocity direction, expressed in N
ulat = cross(c3,uvp); % unit vec perp to v and d3 that points to right, right?
%% first calc moments in uvp (roll), ulat(pitch) directions, then express in n1,n2,n3
omegaD_N_inC = [fd*ct thd fd*st+gd]; % expressed in c1,c2,c3
omegaD_N_inN = T_c_N'*omegaD_N_inC'; % expressed in n1,n2,n3
omegavp = dot(omegaD_N_inN,uvp);
omegalat = dot(omegaD_N_inN,ulat);
omegaspin = dot(omegaD_N_inN,c3); % omegaspin = p1=fd*st+gd
AdvR= d*omegaspin/2/vmag ; % advanced ration
if CoefUsed==1 % using short flights coefficients
CL = CLo + CLa*alpha;
alphaeq = -CLo/CLa; % this is angle of attack at zero lift
CD = CDo + CDa*(alpha-alphaeq)*(alpha-alphaeq);
CM=CMo + CMa*alpha;
%CRr= CRr*d*omegaspinv/2./vmagv';
%CRr= CRr*sqrt(d/g)*omegaspinv; % this line produces NaN, so leave it in Mvp equation
%Mvp = Adp*d* (CRr*d*omegaspin/2/vmag + CRp*omegavp)*uvp; % expressed in N
Mvp = Adp*d*(sqrt(d/g)*CRr*omegaspin + CRp*omegavp)*uvp; % expressed in N
end % if CoefUsed==1 % using short flights coefficients
if CoefUsed==2 % using potts coefficients
%% interpolation of Potts and Crowther (2002) data
CL = interp1(CL_data(:,1), CL_data(:,2), alpha,'spline');
CD = interp1(CD_data(:,1), CD_data(:,2), alpha,'spline');
CM = interp1(CM_data(:,1), CM_data(:,2), alpha,'spline');
CRr = interp2(CRr_rad,CRr_AdvR,CRr_data,alpha,AdvR,'spline');
Mvp = Adp*d* (CRr* + CRp*omegavp)*uvp; % Roll moment, expressed in N
end % if CoefUsed==2 % using potts coefficients
lift = CL*Adp;
drag = CD*Adp;
ulift = -cross(uvel,ulat); % ulift always has - d3 component
udrag = -uvel;
Faero = lift*ulift + drag*udrag; % aero force in N
FgN = [ 0 0 m*g]'; % gravity force in N
F = Faero' + FgN;
Mlat = Adp*d*(CM + CMq*omegalat)*ulat; % Pitch moment expressed in N
Mspin = [0 0 +CNr*(omegaspin)]; % Spin Down moment expressed in C
M = T_c_N*Mvp' + T_c_N*Mlat' + Mspin'; % Total moment expressed in C
% set moments equal to zero if wanted...
% M=[0 0 0];
% calculate the derivatives of the states
xdot = vel';
xdot(4) = (F(1)/m); %accx
xdot(5) = (F(2)/m); %accy
xdot(6) = (F(3)/m); %accz
xdot(7) = fd;
xdot(8) = thd;
xdot(9) = (M(1) + Id*thd*fd*st - Ia*thd*(fd*st+gd) + Id*thd*fd*st)/Id/ct;
xdot(10) = (M(2) + Ia*fd*ct*(fd*st +gd) - Id*fd*fd*ct*st)/Id;
xdot(11) = (M(3) - Ia*(fdd*st + thd*fd*ct))/Ia;
xdot(12) = x(11);
xdot(13) = Wx;
xdot(14) = Wy;
% calculate angular momentum
H = [Id 0 0 ; 0 Id 0; 0 0 Ia]*omegaD_N_inC';
format long;
magH = norm(H);
format short;
Wx and Wy are wind vectors. I'm trying to affect the Adp variable because it is direcly connected with lift and drag. I made Wx = 1 [m/s] and the effect is immense, but should be very little. I'm terrible with Matlab so I'm sure I making some kind of stupid mistake from not understanding well how it all works.

ode45 solving of diff.equation with further fitting to exp.results

I am building a code to solve a diff. equation:
function dy = KIN1PARM(t,y,k)
% version : first order reaction
% A --> B
% dA/dt = -k*A
% integrated form A = A0*exp(-k*t)
dy = -k.*y;
I want this equation to be solved numerically and the results (y as a function of t, and k) to be used for minimization with respect to the experimental values to get the optimal value of parameter k.
function SSE = SSE_minimization_1parm(tspan_inp,val_exp,k_inp,y0_inp)
f = #(Tt,Ty) KIN1PARM(Tt,Ty,k_inp); %function to call ode45
size_limit = length(y0_inp);
options = odeset('NonNegative',1:size_limit,'RelTol',1e-4,'AbsTol', 1e-4);
[ts,val_theo] = ode45(f, tspan_inp, y0_inp,options); %Cexp is the state variable predicted by the model
err = val_exp - val_theo;
SSE = sum(err.^2); %sum squared-error
The main code to plot the experimental and calculated data is:
% Analyzing first order kinetics
clear all; clc;
figure_title = 'Experimental Data';
label_abscissa = 'Time [s]';
label_ordinatus = 'Concentration [mol/L]';
abscissa = [ 0;
ordinatus = [ 0;
title_string = [' Time [s]', ' | ', ' Complex [mol/L] ', ' '];
for i=1:length(abscissa)
report_raw_data{i} = sprintf('%1.3E\t',abscissa(i),ordinatus(i));
%---------------------/plotting dot data/-------------------------------------
f = figure('Position', [100 100 700 500]);
title(figure_title,'FontName','arial','FontWeight','bold', 'FontSize', 12);
xlabel(label_abscissa, 'FontSize', 12);
ylabel(label_ordinatus, 'FontSize', 12);
grid on; hold on;
marker_style = { 's'};
plot(abscissa,ordinatus, marker_style{1},...
'MarkerFaceColor', 'black',...
'MarkerEdgeColor', 'black',...
options = optimset('Display','iter','TolFun',1e-4,'TolX',1e-4);
CPUtime0 = cputime;
Time_M = abscissa;
Concentration_M = ordinatus;
tspan = Time_M;
y0 = 0;
k0 = rand(1);
[k, fval, exitflag, output] = fminsearch(#(k) SSE_minimization_1parm(tspan,Concentration_M,k,y0),k0,options);
CPUtimex = cputime;
CPUtime_delay = CPUtimex - CPUtime0;
%---------------------/plotting calculated data/-------------------------------------
xupperlimit = Time_M(length(Time_M));
xval = ([0:1:xupperlimit])';
yvector = data4plot_1parm(xval,k,y0);
plot(xval,yvector, 'r');
hold on;
%---------------------/printing calculated data/-------------------------------------
disp(['CPU time: ',sprintf('%0.5f\t',CPUtime_delay),' sec']);
disp(['k: ',sprintf('%1.3E\t',k')]);
disp(['fval: ',sprintf('%1.3E\t',fval)]);
disp(['exitflag: ',sprintf('%1.3E\t',exitflag)]);
disp(['Output: ',output.message]);
The corresponding function, which uses the optimized parameter k to yield the calculated y = f(t) data :
function val = data4plot_1parm(tspan_inp,k_inp,y0_inp)
f = #(Tt,Ty) KIN1PARM(Tt,Ty,k_inp);
size_limit = length(y0_inp);
options = odeset('NonNegative',1:size_limit,'RelTol',1e-4,'AbsTol',1e-4);
[ts,val_theo] = ode45(f, tspan_inp, y0_inp, options);
The code runs optimization cycles always giving different values of parameter k, which are different from the value calculated using ln(y) vs t (should be around 7.0e-4 for that series of exp. data).
Looking at the outcome of the ode solver (SSE_minimization_1parm => val_theo) I found that the ode function gives me a vector of zeroes.
Could someone help me , please, to figure out what's going with the ode solver ?
Thanks much in advance !
So here comes the best which I can get right now. For my way I tread ordinatus values as time and the abscissa values as measured quantity which you try to model. Also, you seem to have set alot of options for the solver, which I all omitted. First comes your proposed solution using ode45(), but with a non-zero y0 = 100, which I just "guessed" from looking at the data (in a semilogarithmic plot).
function main
abscissa = [0;
ordinatus = [ 0;
tspan = [min(ordinatus), max(ordinatus)]; % // assuming ordinatus is time
y0 = 100; % // <---- Probably the most important parameter to guess
k0 = -0.1; % // <--- second most important parameter to guess (negative for growth)
k_opt = fminsearch(#minimize, k0) % // optimization only over k
% nested minimization function
function e = minimize(k)
sol = ode45(#KIN1PARM, tspan, y0, [], k);
y_hat = deval(sol, ordinatus); % // evaluate solution at given times
e = sum((y_hat' - abscissa).^2); % // compute squarederror
% // plot with optimal parameter
[T,Y] = ode45(#KIN1PARM, tspan, y0, [], k_opt);
plot(ordinatus, abscissa,'ko', 'markersize',10,'markerfacecolor','black')
hold on
plot(T,Y, 'r--', 'linewidth', 2)
% // Another attempt with fminsearch and the integral form
t = ordinatus;
t_fit = linspace(min(ordinatus), max(ordinatus))
y = abscissa;
% create model function with parameters A0 = p(1) and k = p(2)
model = #(p, t) p(1)*exp(-p(2)*t);
e = #(p) sum((y - model(p, t)).^2); % minimize squared errors
p0 = [100, -0.1]; % an initial guess (positive A0 and probably negative k for exp. growth)
p_fit = fminsearch(e, p0); % Optimize
% Add to plot
plot(t_fit, model(p_fit, t_fit), 'b-', 'linewidth', 2)
legend('location', 'best', 'data', 'ode45 with fixed y0', ...
sprintf ('integral form: %5.1f*exp(-%.4f)', p_fit))
function dy = KIN1PARM(t,y,k)
% version : first order reaction
% A --> B
% dA/dt = -k*A
% integrated form A = A0*exp(-k*t)
dy = -k.*y;
The result can be seen below. Quit surprisingly to me, the initial guess of y0 = 100 fits quite well with the optimal A0 found. The result can be seen below:

Calculating a 2D joint probability distribution

I have many points inside a square. I want to partition the square in many small rectangles and check how many points fall in each rectangle, i.e. I want to compute the joint probability distribution of the points. I am reporting a couple of common sense approaches, using loops and not very efficient:
% Data
N = 1e5; % number of points
xy = rand(N, 2); % coordinates of points
xy(randi(2*N, 100, 1)) = 0; % add some points on one side
xy(randi(2*N, 100, 1)) = 1; % add some points on the other side
xy(randi(N, 100, 1), :) = 0; % add some points on one corner
xy(randi(N, 100, 1), :) = 1; % add some points on one corner
inds= unique(randi(N, 100, 1)); xy(inds, :) = repmat([0 1], numel(inds), 1); % add some points on one corner
inds= unique(randi(N, 100, 1)); xy(inds, :) = repmat([1 0], numel(inds), 1); % add some points on one corner
% Intervals for rectangles
K1 = ceil(sqrt(N/5)); % number of intervals along x
K2 = K1; % number of intervals along y
int_x = [0:(1 / K1):1, 1+eps]; % intervals along x
int_y = [0:(1 / K2):1, 1+eps]; % intervals along y
% First approach
count_cells = zeros(K1 + 1, K2 + 1);
for k1 = 1:K1+1
inds1 = (xy(:, 1) >= int_x(k1)) & (xy(:, 1) < int_x(k1 + 1));
for k2 = 1:K2+1
inds2 = (xy(:, 2) >= int_y(k2)) & (xy(:, 2) < int_y(k2 + 1));
count_cells(k1, k2) = sum(inds1 .* inds2);
% Elapsed time is 46.090677 seconds.
% Second approach
count_again = zeros(K1 + 2, K2 + 2);
for k1 = 1:K1+1
inds1 = (xy(:, 1) >= int_x(k1));
for k2 = 1:K2+1
inds2 = (xy(:, 2) >= int_y(k2));
count_again(k1, k2) = sum(inds1 .* inds2);
count_again_fix = diff(diff(count_again')');
% Elapsed time is 22.903767 seconds.
% Check: the two solutions are equivalent
all(count_cells(:) == count_again_fix(:))
How can I do it more efficiently in terms of time, memory, and possibly avoiding loops?
EDIT --> I have just found this as well, it's the best solution found so far:
count_cells_hist = hist3(xy, 'Edges', {int_x int_y});
count_cells_hist(end, :) = []; count_cells_hist(:, end) = [];
all(count_cells(:) == count_cells_hist(:))
% Elapsed time is 0.245298 seconds.
but it requires the Statistics Toolbox.
EDIT --> Testing solution suggested by chappjc
xcomps = single(bsxfun(#ge,xy(:,1),int_x));
ycomps = single(bsxfun(#ge,xy(:,2),int_y));
count_again = xcomps.' * ycomps; %' 143x143 = 143x1e5 * 1e5x143
count_again_fix = diff(diff(count_again')');
% Elapsed time is 0.737546 seconds.
all(count_cells(:) == count_again_fix(:))
I have written a simple mex function which works very well when N is large. Of course it's cheating but still ...
The function is
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
unsigned long int hh, ctrl; /* counters */
unsigned long int N, m, n; /* size of matrices */
unsigned long int *xy; /* data */
unsigned long int *count_cells; /* joint frequencies */
/* matrices needed */
mxArray *count_cellsArray;
/* Now we need to get the data */
if (nrhs == 3) {
xy = (unsigned long int*) mxGetData(prhs[0]);
N = (unsigned long int) mxGetM(prhs[0]);
m = (unsigned long int) mxGetScalar(prhs[1]);
n = (unsigned long int) mxGetScalar(prhs[2]);
/* Then build the matrices for the output */
count_cellsArray = mxCreateNumericMatrix(m + 1, n + 1, mxUINT32_CLASS, mxREAL);
count_cells = mxGetData(count_cellsArray);
plhs[0] = count_cellsArray;
hh = 0; /* counter for elements of xy */
/* for all points from 1 to N */
for(hh=0; hh<N; hh++) {
ctrl = (m + 1) * xy[N + hh] + xy[hh];
count_cells[ctrl] = count_cells[ctrl] + 1;
It can be saved in a file "joint_dist_points_2D.c", then compiled:
mex joint_dist_points_2D.c
And check it out:
% Data
N = 1e7; % number of points
xy = rand(N, 2); % coordinates of points
xy(randi(2*N, 1000, 1)) = 0; % add some points on one side
xy(randi(2*N, 1000, 1)) = 1; % add some points on the other side
xy(randi(N, 1000, 1), :) = 0; % add some points on one corner
xy(randi(N, 1000, 1), :) = 1; % add some points on one corner
inds= unique(randi(N, 1000, 1)); xy(inds, :) = repmat([0 1], numel(inds), 1); % add some points on one corner
inds= unique(randi(N, 1000, 1)); xy(inds, :) = repmat([1 0], numel(inds), 1); % add some points on one corner
% Intervals for rectangles
K1 = ceil(sqrt(N/5)); % number of intervals along x
K2 = ceil(sqrt(N/7)); % number of intervals along y
int_x = [0:(1 / K1):1, 1+eps]; % intervals along x
int_y = [0:(1 / K2):1, 1+eps]; % intervals along y
% Use Statistics Toolbox: hist3
count_cells_hist = hist3(xy, 'Edges', {int_x int_y});
count_cells_hist(end, :) = []; count_cells_hist(:, end) = [];
% Elapsed time is 4.414768 seconds.
% Use mex function
xy2 = uint32(floor(xy ./ repmat([1 / K1, 1 / K2], N, 1)));
count_cells = joint_dist_points_2D(xy2, uint32(K1), uint32(K2));
% Elapsed time is 0.586855 seconds.
% Check: the two solutions are equivalent
all(count_cells_hist(:) == count_cells(:))
Improving on code in question
Your loops (and the nested dot product) can be eliminated with bsxfun and matrix multiplication as follows:
xcomps = bsxfun(#ge,xy(:,1),int_x);
ycomps = bsxfun(#ge,xy(:,2),int_y);
count_again = double(xcomps).'*double(ycomps); %' 143x143 = 143x1e5 * 1e5x143
count_again_fix = diff(diff(count_again')');
The multiplication step accomplishes the AND and summation done in sum(inds1 .* inds2), but without looping over the density matrix. EDIT: If you use single instead of double, execution time is nearly halved, but be sure to convert your answer to double or whatever is required for the rest of the code. On my computer this takes around 0.5 sec.
Note: With rot90(count_again/size(xy,1),2) you have a CDF, and in rot90(count_again_fix/size(xy,1),2) you have a PDF.
Using accumarray
Another approach is to use accumarray to make the joint histogram after we bin the data.
Starting with int_x, int_y, K1, xy, etc.:
% take (0,1) data onto [1 K1], following A.Dondas approach for easy comparison
ii = floor(xy(:,1)*(K1-eps))+1; ii(ii<1) = 1; ii(ii>K1) = K1;
jj = floor(xy(:,2)*(K1-eps))+1; jj(jj<1) = 1; jj(jj>K1) = K1;
% create the histogram and normalize
H = accumarray([ii jj],ones(1,size(ii,1)));
PDF = H / size(xy,1); % for probabilities summing to 1
On my computer, this takes around 0.01 sec.
The output is the same as A.Donda's converted from sparse to full (full(H)). Although, as he A.Donda pointed out, it is correct to have the dimensions be K1xK1, rather than the size of count_again_fix in the OPs code that was K1+1xK1+1.
To get the CDF, I believe you can simply apply cumsum to each axis of the PDF.
chappjc's answer and using hist3 are all good, but since I happened to want to have something like this some time ago and for some reason didn't find hist3 I wrote it myself, and I thought I'd post it here as a bonus. It uses sparse to do the actual counting and returns the result as a sparse matrix, so it may be useful for dealing with a multimodal distribution where different modes are far apart – or for someone who doesn't have the Statistics Toolbox.
Application to francesco's data:
K1 = ceil(sqrt(N/5));
[H, xs, ys] = hist2d(xy(:, 1), xy(:, 2), [K1 K1], [0, 1 + eps, 0, 1 + eps]);
Called with output parameters the function just returns the result, without it makes a color plot.
Here's the function:
function [H, xs, ys] = hist2d(x, y, n, ax)
% plot 2d-histogram as an image
% hist2d(x, y, n, ax)
% [H, xs, ys] = hist2d(x, y, n, ax)
% x: data for horizontal axis
% y: data for vertical axis
% n: how many bins to use for each axis, default is [100 100]
% ax: axis limits for the plot, default is [min(x), max(x), min(y), max(y)]
% H: 2d-histogram as a sparse matrix, indices 1 & 2 correspond to x & y
% xs: corresponding vector of x-values
% ys: corresponding vector of y-values
% x and y have to be column vectors of the same size. Data points
% outside of the axis limits are allocated to the first or last bin,
% respectively. If output arguments are given, no plot is generated;
% it can be reproduced by "imagesc(ys, xs, H'); axis xy".
% defaults
if nargin < 3
n = [100 100];
if nargin < 4
ax = [min(x), max(x), min(y), max(y)];
% parameters
nx = n(1);
ny = n(2);
xl = ax(1 : 2);
yl = ax(3 : 4);
% generate histogram
i = floor((x - xl(1)) / diff(xl) * nx) + 1;
i(i < 1) = 1;
i(i > nx) = nx;
j = floor((y - yl(1)) / diff(yl) * ny) + 1;
j(j < 1) = 1;
j(j > ny) = ny;
H = sparse(i, j, ones(size(i)), nx, ny);
% generate axes
xs = (0.5 : nx) / nx * diff(xl) + xl(1);
ys = (0.5 : ny) / ny * diff(yl) + yl(1);
% possibly plot
if nargout == 0
imagesc(ys, xs, H')
axis xy
clear H xs ys