scipy interpolation method using Dask dataframe - scipy

I have read bunch of dask examples from either someone's GitHub code or the dask issues. But still have a problem of using Scipy interpolation with Dask parallel computing and hoping someone here can help me to solve it.
I actually have issue in how to expand each partition boundary. Please see my description below, and let me know if you cannot understand it.
My data is unstructured and cannot using array
My interpolation code is running, but there are some strange points occurring, I bet that is because of the edge effect. For example
The left panel is the output from "dask" linearNDInterpolator, while the middle panel is the original dataset and the right panel is using linearNDInterpolator directly.
The left panel is a scatter plot that is linear interpolated from original dataset, while the right hand side one is using dask linearinterpolation by dask.dataframe [parallel].
You can clearly see that the parallel computing results has no clear shape, and may possible see some strange points within the map.
Here is my code 01: Using dask.array.
def lNDIwrap(src_lon, src_lat, src_elv, tag_lon, tag_lat):
return lNDI(list(zip(src_lon, src_lat)), src_elv)(tag_lon, tag_lat)
n_splits=96
#--- topodf is the topography dataset [pandas].
#--- df is the python h3 generated hexagon grid by topodf in resolution-12
dsrc = dd.from_pandas(topodf, npartitions=n_splits)
dtag = dd.from_pandas(df, npartitions=n_splits)
#--- Output the chunking dask array
slon,slat,data = dsrc.to_dask_array(lengths=True).T
tlon,tlat = dtag[['lon','lat']].to_dask_array(lengths=True).T
#--- Using **dask delayed** function to pass each partition into functions [lNDIwrap]
gd_chunked_lNDI = [delayed(lNDIwrap)(x1, y1, newarr, xx, yy) for x1, y1, newarr, xx, yy in \
zip(slon.to_delayed().flatten(),
slat.to_delayed().flatten(),
data.to_delayed().flatten(),
tlon.to_delayed().flatten(),
tlat.to_delayed().flatten())]
#--- Using **dask delayed** function to concat all partitions into one np.array
gd_lNDI = delayed(da.concatenate)(gd_chunked_lNDI, axis=0)
results_lNDI = np.array(gd_lNDI.compute())
Here is my code 02: Using dask.dataframe.
def DDlNDIwrap(df, data_name='nir'):
dtag = target_hexgrid(df, hex_res=12)
slon, slat, data = df[['lon','lat',data_name]].values.T
tlon, tlat = dtag[['lon','lat']].values.T
tout = lNDI(list(zip(slon, slat)), data)(tlon, tlat)
dout = pd.DataFrame(np.vstack([tlon,tlat,tout]).T,
columns=['lon','lat',data_name]
)
return dd.from_pandas(dout, npartitions=1)
n_splits=96
#--- ds is the sentinel-2 satellite dataset. Reading from .til file.
gd_chunked_lNDI = [ delayed(DDlNDIwrap)(ds) for ds in dsrc.to_delayed()]
gd_lNDI = delayed(dd.concat)(gd_chunked_lNDI, axis=0)
gd = gd_lNDI.compute().compute()
I suspected that unknown patterns are coming on the edge/side-effect. What I meant is that those points could be around the edge of each partition so that there have not enough data points for interpolation. I found that in Dask manual I could possibly be able to use map_overlap, map_partitions, and map_blocks to solve my question. But I keep failed it. Could someone help me to solve this?
PS:
Following is what I tried by using map_overlap function.
def maplNDIwrap(df, data_name='nir'):
dtag = target_hexgrid(df, hex_res=12)
slon, slat, data = df[['lon','lat',data_name]].values.T
tlon, tlat = dtag[['lon','lat']].values.T
tout = lNDI(list(zip(slon, slat)), data)(tlon, tlat)
print(len(tlon), len(tout))
dout = pd.DataFrame(np.vstack([tlon,tlat,tout]).T,
columns=['lon','lat',data_name]
)
return dout
dtag = target_hexgrid(dsrc.compute(), hex_res=12)
gd_map_lNDI = dsrc.map_overlap(maplNDIwrap,1,1, meta=type(dsrc))
print(len(dtag)) #--> output: 353678
gd_map_lNDI.compute() #--> Not match expected size
Updates:
Here I defined few function to generate synthetic dataset.
def lonlat(lon_min,lon_max, lat_min, lat_max, res=5):
xps = round((lon_max - lon_min)*110*1e3/res)
yps = round((lat_max - lat_min)*110*1e3/res)
return np.meshgrid(np.linspace(lon_min,lon_max,xps),
np.linspace(lat_min,lat_max,yps)
)
def xy_based_map(x,y):
x = np.pi*x/180
y = np.pi*y/180
return np.log10((1 - x / 3. + x ** 2 + (2*x*y) ** 3) * np.exp(-x ** 2 - y ** 2))
Using the similar method I provided above with the result below.
You can definitely see there are lines in the interpolation outputs.
lon_min, lon_max, lat_min, lat_max = -70.42,-70.40,-30.42,-30.40
lon05, lat05 = lonlat(lon_min, lon_max, lat_min, lat_max,res=5)
z05 = xy_based_map(lon05,lat05)
df05= pd.DataFrame(np.vstack((lon05.ravel(), lat05.ravel(), z05.ravel())).T, columns=['lon','lat','z'])
df05= dd.from_pandas(df05, npartitions=n_splits)
lon30, lat30 = lonlat(lon_min, lon_max, lat_min, lat_max,res=30)
z30 = xy_based_map(lon30,lat30)
df30= pd.DataFrame(np.vstack((lon30.ravel(), lat30.ravel(), z30.ravel())).T, columns=['lon','lat','z'])
df30= dd.from_pandas(df30, npartitions=n_splits)
```.
```python
tlon, tlat = df05[['lon','lat']].values.T
slon, slat, data = df10.values.T
gd_chunked_lNDI = [delayed(lNDIwrap)(x1, y1, newarr, xx, yy) for x1, y1, newarr, xx, yy in \
zip(slon.to_delayed().flatten(),
slat.to_delayed().flatten(),
data.to_delayed().flatten(),
tlon.to_delayed().flatten(),
tlat.to_delayed().flatten())]
gd_chunked_cNDI = [delayed(cNDIwrap)(x1, y1, newarr, xx, yy) for x1, y1, newarr, xx, yy in \
zip(slon.to_delayed().flatten(),
slat.to_delayed().flatten(),
data.to_delayed().flatten(),
tlon.to_delayed().flatten(),
tlat.to_delayed().flatten())]
gd_chunked_rNDI = [delayed(rNDIwrap)(x1, y1, newarr, xx, yy) for x1, y1, newarr, xx, yy in \
zip(slon.to_delayed().flatten(),
slat.to_delayed().flatten(),
data.to_delayed().flatten(),
tlon.to_delayed().flatten(),
tlat.to_delayed().flatten())]
gd_lNDI = delayed(da.concatenate)(gd_chunked_lNDI, axis=0)
gd_cNDI = delayed(da.concatenate)(gd_chunked_cNDI, axis=0)
gd_rNDI = delayed(da.concatenate)(gd_chunked_rNDI, axis=0)
results_lNDI_10m = np.array(gd_lNDI.compute())
results_cNDI_10m = np.array(gd_cNDI.compute())
results_rNDI_10m = np.array(gd_rNDI.compute())
#--- No parallel computing
a,b,c,d,e = slon.compute(),slat.compute(),data.compute(),tlon.compute(),tlat.compute()
straight_lNDI_10m = lNDIwrap(a,b,c,d,e)
###--- 30m --> 5m
tlon, tlat = df05[['lon','lat']].values.T
slon, slat, data = df30.values.T
gd_chunked_lNDI = [delayed(lNDIwrap)(x1, y1, newarr, xx, yy) for x1, y1, newarr, xx, yy in \
zip(slon.to_delayed().flatten(),
slat.to_delayed().flatten(),
data.to_delayed().flatten(),
tlon.to_delayed().flatten(),
tlat.to_delayed().flatten())]
gd_chunked_cNDI = [delayed(cNDIwrap)(x1, y1, newarr, xx, yy) for x1, y1, newarr, xx, yy in \
zip(slon.to_delayed().flatten(),
slat.to_delayed().flatten(),
data.to_delayed().flatten(),
tlon.to_delayed().flatten(),
tlat.to_delayed().flatten())]
gd_chunked_rNDI = [delayed(rNDIwrap)(x1, y1, newarr, xx, yy) for x1, y1, newarr, xx, yy in \
zip(slon.to_delayed().flatten(),
slat.to_delayed().flatten(),
data.to_delayed().flatten(),
tlon.to_delayed().flatten(),
tlat.to_delayed().flatten())]
gd_lNDI = delayed(da.concatenate)(gd_chunked_lNDI, axis=0)
gd_cNDI = delayed(da.concatenate)(gd_chunked_cNDI, axis=0)
gd_rNDI = delayed(da.concatenate)(gd_chunked_rNDI, axis=0)
results_lNDI_30m = np.array(gd_lNDI.compute())
results_cNDI_30m = np.array(gd_cNDI.compute())
results_rNDI_30m = np.array(gd_rNDI.compute())
###--- No Parallel for 30m --> 5m
a,b,c,d,e = slon.compute(),slat.compute(),data.compute(),tlon.compute(),tlat.compute()
straight_lNDI_30m = lNDIwrap(a,b,c,d,e)
###--- For plots.
dout= pd.DataFrame(np.vstack((tlon.compute(),tlat.compute(),df05.z.values.compute(),
results_lNDI_10m,results_cNDI_10m, results_rNDI_10m,straight_lNDI_10m,
results_lNDI_30m,results_cNDI_30m, results_rNDI_30m,straight_lNDI_30m,
)).T, columns=['lon','lat','orig','lNDI10','cNDI10','rNDI10','stgh10','lNDI30','cNDI30','rNDI30','stgh30'])

Related

Problem with normalizing a function with likelihood

I have a problem with the following code. In the evidence part, the value is very small so, in the end, the probabilities are not calculated. I need to normalize it, but in which part should it be?
The code in MATLAB is:
clear all; close all; clc;
randn('seed', 1234);
resistivities = [50 200 2000 1500];
thicknesses = [500 100 200];
Par_real = [resistivities, thicknesses];
dataFreq = logspace(log10(0.001), log10(1000), 100);
[Ydata, phase] = modelMT2(Par_real, dataFreq);
sigma = 0.1;
Yexp = Ydata + sigma*randn(size(Ydata));
plot(dataFreq, Yexp, '.'); hold on; plot(dataFreq, Ydata, '-')
nsamples = 20000;
R1 = 5;
R2 = 2050;
P1 = 25;
P2 = 500;
Resis = R1 + (R2-R1)*rand(nsamples, 7);
Profs = P1 + (P2-P1)*rand(nsamples, 6);
for ii=1:nsamples
par3C = [Resis(ii, 1:3), Profs(ii, 1:2)];
par4C = [Resis(ii, 1:4), Profs(ii, 1:3)];
par5C = [Resis(ii, 1:5), Profs(ii, 1:4)];
par7C = [Resis(ii, 1:7), Profs(ii, 1:6)];
Like_M3C(ii) = log_likelihood(#modelMT2, dataFreq, Yexp, sigma, par3C);
Like_M4C(ii) = log_likelihood(#modelMT2, dataFreq, Yexp, sigma, par4C);
Like_M5C(ii) = log_likelihood(#modelMT2, dataFreq, Yexp, sigma, par5C);
Like_M7C(ii) = log_likelihood(#modelMT2, dataFreq, Yexp, sigma, par7C);
end
figure()
subplot(1, 2, 1)
plot(exp(Like_M5C))
subplot(1, 2, 2)
hist(exp(Like_M5C))
Evidencia(1) = mean(exp(Like_M3C));
Evidencia(2) = mean(exp(Like_M4C));
Evidencia(3) = mean(exp(Like_M5C));
Evidencia(4) = mean(exp(Like_M7C));
Denominador = sum(Evidencia);
PPMM = Evidencia/Denominador;
fprintf('La probabilidad de los modelos : \n');
fprintf('--------------------------------\n');
fprintf('Modelo M3C: %.4f \n', PPMM(1));
fprintf('Modelo M4C: %.4f \n', PPMM(2));
fprintf('Modelo M5C: %.4f \n', PPMM(3));
fprintf('Modelo M7C: %.4f \n', PPMM(4));
figure()
model = [1, 2, 3, 4];
bar(model, PPMM), grid on
ylim([0, 1])
xlabel('Modelo')
ylabel('Probabilidad del modelo')
function [LogPDF_post] = log_likelihood(Mod, xx, data, sigma, oldpar)
erro = (Mod(oldpar, xx) - data)';
LogPDF_post = -0.5 * erro' * 1/sigma^2 * erro;
end
I have tried to normalize the likelihood as follows, but it doesn't work. It gives equal probability in all cases.
function [LogPDF_norma] = log_likelihood(Mod, xx, data, sigma, oldpar)
erro = (Mod(oldpar, xx) - data)';
LogPDF_post = -0.5 * erro' * 1/sigma^2 * erro;
LogPDF_norma = (1/max(LogPDF_post))*LogPDF_post;
end

how to create a surface in matlab using interpolation

I have x y z data which looks like the following:
How do i create a surface across the lines using z values in matlab (interpolated surface)?
I tried this method but i am getting following error:
[fn,pn] = uigetfile('*.xyz','Open the file');
I = importdata([pn,fn], ',', 16);
x = I.data(:,1);
y = I.data(:,2);
z = I.data(:,3);
%%
spX = min(x):3:max(x);
spY = min(y):3:max(y);
[xC,yC] = meshgrid(spX,spY);
Vq = interp2(x,y,z,xC,yC);
Error using griddedInterpolant
The grid vectors must be strictly monotonically increasing.
Error in interp2>makegriddedinterp (line 229)
F = griddedInterpolant(varargin{:});
Error in interp2 (line 129)
F = makegriddedinterp({X, Y}, V, method,extrap);
Try griddata
spX = min(x):3:max(x);
spY = min(y):3:max(y);
[xC,yC] = meshgrid(spX,spY);
zC = griddata(x,y,z,xC,yC);
surf(xC,yC,zC)

4d integral in matlab

I'm trying to evaluate the integral using MATLAB with an equation containing 4 related random variables, thus the boundaries of the integrals are not constant.
There are 2 exponential pdfs, and 2 other hyper exponential , and 1 Reyleigh CDF all multiplied together and with (x - y - z).
I'm trying to evaluate it using integral Q = (#(w) integral3(#(x,y,z,w),xmin,xmax,ymin,ymax,zmin,zmax),wmin,wmax);
I'm getting an error always. this is my code down here :
u_x = 10; % rate!
x_th = .3;
sigma = 1.33;
u_y = 10;
u_w = 100;
a= 1;
fun = #(x,y,z,w) (x - y - z )*u_x*exp(-u_x*x)*u_y*exp(-u_y*y)*((a/(a+1))*(a*u_w)*exp(-a*u_w*w)+((1/(a+1))*(u_w/a))*exp(-u_w*w/a))*((a/(a+1))*(a*u_w)*exp(-a*u_w*z)+((1/(a+1))*(u_w/a))*exp(-u_w*z/a))*(1-exp(-x_th/sigma^2))
xmin = #(y)y;
xmax = #(y,w)y + w;
ymin = 0;
ymax = inf;
zmin = 0;
zmax = #(w) w;
wmin = 0;
wmax = inf;
Q = integral(#(w) integral3(fun,xmin,xmax,ymin,ymax,zmin,zmax),wmin,wmax);
ERROR MESSAGE :
Error using integral3 (line 63)
XMIN must be a floating point scalar.
Error in numerical_int>#(w)integral3(fun,xmin,xmax,ymin,ymax,zmin,zmax)
Error in integralCalc/iterateScalarValued (line 314)
fx = FUN(t);
Error in integralCalc/vadapt (line 132)
[q,errbnd] = iterateScalarValued(u,tinterval,pathlen);
Error in integralCalc (line 83)
[q,errbnd] = vadapt(#AToInfInvTransform,interval);
Error in integral (line 88)
Q = integralCalc(fun,a,b,opstruct);
Error in numerical_int (line 28)
Q = integral(#(w) integral3(fun,xmin,xmax,ymin,ymax,zmin,zmax),wmin,wmax);
integral3() requires all the lower bounds to be real
number
int() does not, but need to use syms function instead of function
handle
For element-wise calculation use the dot. operator followed by the
concerned operator like
1) * ---> .* 2) / ---> ./ 3) ^ ---> .^
Read this for more information on how to use int() for nd
integral
The code is as follows
syms x y z w
u_x = 10; % rate!
x_th = .3;
sigma = 1.33;
u_y = 10;
u_w = 100;
a= 1;
fun = (x - y - z ).*u_x.*exp(-u_x*x).*u_y.*exp(-u_y.*y)...
.*((a./(a+1)).*(a.*u_w).*exp(-a.*u_w.*w)+((1./(a+1)).*(u_w./a))...
.*exp(-u_w.*w./a)).*((a./(a+1)).*(a*u_w)*exp(-a.*u_w.*z)...
+((1./(a+1)).*(u_w./a))*exp(-u_w.*z./a)).*(1-exp(-x_th./sigma.^2));
xmin = y;
xmax = y + w;
ymin = 0;
ymax = inf;
zmin = 0;
zmax = w;
wmin = 0;
wmax = inf;
% Integrate along x
intx = int(fun, x, xmin, xmax);
% Integrate along y
intxy = int(intx, y, ymin, ymax);
% Integrate along z
intxyz = int(intxy, z, zmin, zmax);
% Integrate along w
intxyzw = int(intxyz, w, wmin, wmax);
value = vpa(intxyzw, 3);
% 2.14e-5

Get Y value of line from X pixel value in ChartJS 2

I have a line graph in chartjs, and I want to find the Y value for an arbitrary point on the line given the pixel value from the x axis.
My Graph
Currently I'm hooking into the afterDatasetsDraw event to add that shaded region to the graph, but I also want to find out the values of the black line (Axis B) at the start and end of the shaded region, which don't necessarily line up with my data points.
afterDatasetsDraw: function (chart) {
var options = chart.config.options.plugins.shader;
if (!options.hasOwnProperty('points')) {
return;
}
if (options.points.length < 2) {
return;
}
var ctx = chart.chart.ctx;
var x1, y1, x2, y2, x3, y3, x4, y4, x0, xf;
console.log(chart);
x0 = chart.scales['x-axis-0'].left;
xf = chart.scales['x-axis-0'].right;
x1 = ((xf - x0) * 0.12) + x0; // start shading at 12% in for example
y1 = chart.scales['A'].bottom;
x2 = x1;
y2 = chart.scales['A'].top;
x3 = ((xf - x0) * 0.66) + x0; // end shading at 66% for example
y3 = y2;
x4 = x3;
y4 = y1;
// console.log(chart.scales['B'].getValueForPixel(x1));
// console.log(chart.scales['B'].getValueForPixel(x3));
// console.log(chart.scales['A'].getValueForPixel(x1));
// console.log(chart.scales['A'].getValueForPixel(x3));
// console.log(chart.scales['x-axis-0'].getValueForPixel(x1));
// console.log(chart.scales['x-axis-0'].getValueForPixel(x3));
ctx.fillStyle = 'rgba(127, 127, 127, 0.3)';
ctx.beginPath();
ctx.moveTo(x1,y1);
ctx.lineTo(x2, y2);
ctx.lineTo(x3, y3);
ctx.lineTo(x4, y4);
ctx.lineTo(x1,y1);
ctx.closePath();
ctx.fill();
}
});
I would assume from the docs that I could use the scales getValueForPixel() method (as is shown commented out) but those are returning strange values. The X-axis values are coming back as 1 and 3 which as far as I can tell are the indexes of the closest data point. The Y scales are returning numbers that don't correspond with anything in the dataset or the rendered graph.

Looping through points in the unit triangle

I have a problem where I have two choice variables x1 and x2 which then pin down a third x3 = 1 - x1 - x2. I would like to loop through various values of [x1, x2, x3]. This code works:
w1 = perms([0.1, 0.1, 0.8]);
w2 = perms([0.1, 0.2, 0.7]);
w3 = perms([0.1, 0.3, 0.6]);
w4 = perms([0.1, 0.4, 0.5]);
w5 = perms([0.2, 0.2, 0.6]);
w6 = perms([0.2, 0.3, 0.5]);
w7 = perms([0.2, 0.4, 0.4]);
w8 = perms([0.3, 0.3, 0.4]);
w = [w1; w2; w3; w4; w5; w6; w7; w8];
w = unique(w,'rows');
% loop
for ii = 1:size(w, 1)
... do some stuff with w(ii, :)
but I am wondering if there is a more elegant way to do this.
This is a classical case for ndgrid
[x1,x2]=ndgrid(0.1:0.1:0.8,0.1:0.1:0.8);
x3 = 1-x1-x2;
%I assume from your example that we want x1,x2,x3 in (0,1) OPEN interval, then:
valid_points = x3>0 & x3 <1
w_prime = [x1(valid_points),x2(valid_points),x3(valid_points)];