Pytorch transformation on MNIST dataset - neural-network

I currently have a project with Weak Supervision where I need to put a "masking" in front of a dataset. My issue right now is that I don't exactly know how to do it. Let me explain further with some code and images.
I am using the MNIST dataset that I have to edit in this way. As you can see a middle square is cut out. The code below is used to edit the MNIST using a for loop.
for i in range(int(image_size/2-5),int(image_size/2+3)):
for j in range(int(image_size/2-5),int(image_size/2+3)):
image[i][j] = 0
However, I am currently not sure how I should use this in a dataloader transform. The code for the dataloader and transform is shown here:
transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])
train_dataset = torchvision.datasets.MNIST(
root="~/torch_datasets", train=True, transform=transform, download=True
)
test_dataset = torchvision.datasets.MNIST(
root="~/torch_datasets", train=False, transform=transform, download=True
)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=128, shuffle=True, num_workers=4, pin_memory=True
)
test_loader = torch.utils.data.DataLoader(
test_dataset, batch_size=32, shuffle=False, num_workers=4
)
def imshow(img):
#img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
dataiter = iter(train_loader)
images, labels = dataiter.next()
imshow(torchvision.utils.make_grid(images))
So is there a straightforward way to apply the transform to the full dataset in the torchvision.transforms.Compose?

You can define any custom transformation and as a function and use torchvision.transforms.Lambda in the transformation pipeline.
def erase_middle(image: torch.Tensor) -> torch.Tensor:
for i in range(int(image_size/2-5),int(image_size/2+3)):
for j in range(int(image_size/2-5),int(image_size/2+3)):
image[:, i, j] = 0
return image
transform = torchvision.transforms.Compose(
[
# First transform it to a tensor
torchvision.transforms.ToTensor(),
# Then erase the middle
torchvision.transforms.Lambda(erase_middle),
]
)
erase_middle can be made more generic, such that it works for images with varying sizes and that aren't necessarily square.
def erase_middle(image: torch.Tensor) -> torch.Tensor:
_, height, width = image.size()
x_start = width // 2 - 5
x_end = width // 2 + 3
y_start = height // 2 - 5
y_end = height // 2 + 3
# Using slices achieves the same as the for loops
image[:, y_start:y_end, x_start:x_end] = 0
return image

Related

How to extract or get the image bounded by Detectron

I am working on creating bounding boxes upon images with my own created training dataset with the help of Detection, while I'm now stuck at the part of extracting the bounded image. I just want the image of the part inside the bounding box.
The input image to predicted.
The predicted image with the bounding box outlines.
Please help me with this query.The resultant image should be like this.
Detection Function in Tensorflow
# Detection Function
detections = detect_fn(input_tensor)
bscores = detections['detection_scores'][0].numpy()
bclasses = detections['detection_classes'][0].numpy().astype(np.int32)
bboxes = detections['detection_boxes'][0].numpy()
det_boxes, class_labels = ExtractBBoxes(bboxes, bclasses, bscores, im_width, im_height, image_name=image_file)
Method to extract and crop bounding box
def ExtractBBoxes(bboxes, bclasses, bscores, im_width, im_height, image_name):
bbox = []
class_labels = []
for idx in range(len(bboxes)):
if bscores[idx] >= Threshold:
#Region of Interest
y_min = int(bboxes[idx][0] * im_height)
x_min = int(bboxes[idx][1] * im_width)
y_max = int(bboxes[idx][2] * im_height)
x_max = int(bboxes[idx][3] * im_width)
class_label = category_index[int(bclasses[idx])]['name']
class_labels.append(class_label)
bbox.append([x_min, y_min, x_max, y_max, class_label, float(bscores[idx])])
#Crop Image
cropped_image = tf.image.crop_to_bounding_box(image, y_min, x_min, y_max - y_min, x_max - x_min).numpy().astype(np.int32)
output_image = tf.image.encode_jpeg(cropped_image) #For Jpeg
score = bscores[idx] * 100
# Create a constant as filename
file_name = tf.constant(youfilename)
file = tf.io.write_file(file_name, output_image)

How to create Bezier curves from B-Splines in Sympy?

I need to draw a smooth curve through some points, which I then want to show as an SVG path. So I create a B-Spline with scipy.interpolate, and can access some arrays that I suppose fully define it. Does someone know a reasonably simple way to create Bezier curves from these arrays?
import numpy as np
from scipy import interpolate
x = np.array([-1, 0, 2])
y = np.array([ 0, 2, 0])
x = np.r_[x, x[0]]
y = np.r_[y, y[0]]
tck, u = interpolate.splprep([x, y], s=0, per=True)
cx = tck[1][0]
cy = tck[1][1]
print( 'knots: ', list(tck[0]) )
print( 'coefficients x: ', list(cx) )
print( 'coefficients y: ', list(cy) )
print( 'degree: ', tck[2] )
print( 'parameter: ', list(u) )
The red points are the 3 initial points in x and y. The green points are the 6 coefficients in cx and cy. (Their values repeat after the 3rd, so each green point has two green index numbers.)
Return values tck and u are described scipy.interpolate.splprep documentation
knots: [-1.0, -0.722, -0.372, 0.0, 0.277, 0.627, 1.0, 1.277, 1.627, 2.0]
# 0 1 2 3 4 5
coefficients x: [ 3.719, -2.137, -0.053, 3.719, -2.137, -0.053]
coefficients y: [-0.752, -0.930, 3.336, -0.752, -0.930, 3.336]
degree: 3
parameter: [0.0, 0.277, 0.627, 1.0]
Not sure starting with a B-Spline makes sense: form a catmull-rom curve through the points (with the virtual "before first" and "after last" overlaid on real points) and then convert that to a bezier curve using a relatively trivial transform? E.g. given your points p0, p1, and p2, the first segment would be a catmull-rom curve {p2,p0,p1,p2} for the segment p1--p2, {p0,p1,p2,p0} will yield p2--p0, and {p1, p2, p0, p1} will yield p0--p1. Then you trivially convert those and now you have your SVG path.
As demonstrator, hit up https://editor.p5js.org/ and paste in the following code:
var points = [{x:150, y:100 },{x:50, y:300 },{x:300, y:300 }];
// add virtual points:
points = points.concat(points);
function setup() {
createCanvas(400, 400);
tension = createSlider(1, 200, 100);
}
function draw() {
background(220);
points.forEach(p => ellipse(p.x, p.y, 4));
for (let n=0; n<3; n++) {
let [c1, c2, c3, c4] = points.slice(n,n+4);
let t = 0.06 * tension.value();
bezier(
// on-curve start point
c2.x, c2.y,
// control point 1
c2.x + (c3.x - c1.x)/t,
c2.y + (c3.y - c1.y)/t,
// control point 2
c3.x - (c4.x - c2.x)/t,
c3.y - (c4.y - c2.y)/t,
// on-curve end point
c3.x, c3.y
);
}
}
Which will look like this:
Converting that to Python code should be an almost effortless exercise: there is barely any code for us to write =)
And, of course, now you're left with creating the SVG path, but that's hardly an issue: you know all the Bezier points now, so just start building your <path d=...> string while you iterate.
A B-spline curve is just a collection of Bezier curves joined together. Therefore, it is certainly possible to convert it back to multiple Bezier curves without any loss of shape fidelity. The algorithm involved is called "knot insertion" and there are different ways to do this with the two most famous algorithm being Boehm's algorithm and Oslo algorithm. You can refer this link for more details.
Here is an almost direct answer to your question (but for the non-periodic case):
import aggdraw
import numpy as np
import scipy.interpolate as si
from PIL import Image
# from https://stackoverflow.com/a/35007804/2849934
def scipy_bspline(cv, degree=3):
""" cv: Array of control vertices
degree: Curve degree
"""
count = cv.shape[0]
degree = np.clip(degree, 1, count-1)
kv = np.clip(np.arange(count+degree+1)-degree, 0, count-degree)
max_param = count - (degree * (1-periodic))
spline = si.BSpline(kv, cv, degree)
return spline, max_param
# based on https://math.stackexchange.com/a/421572/396192
def bspline_to_bezier(cv):
cv_len = cv.shape[0]
assert cv_len >= 4, "Provide at least 4 control vertices"
spline, max_param = scipy_bspline(cv, degree=3)
for i in range(1, max_param):
spline = si.insert(i, spline, 2)
return spline.c[:3 * max_param + 1]
def draw_bezier(d, bezier):
path = aggdraw.Path()
path.moveto(*bezier[0])
for i in range(1, len(bezier) - 1, 3):
v1, v2, v = bezier[i:i+3]
path.curveto(*v1, *v2, *v)
d.path(path, aggdraw.Pen("black", 2))
cv = np.array([[ 40., 148.], [ 40., 48.],
[244., 24.], [160., 120.],
[240., 144.], [210., 260.],
[110., 250.]])
im = Image.fromarray(np.ones((400, 400, 3), dtype=np.uint8) * 255)
bezier = bspline_to_bezier(cv)
d = aggdraw.Draw(im)
draw_bezier(d, bezier)
d.flush()
# show/save im
I didn't look much into the periodic case, but hopefully it's not too difficult.

hot to get pixel per meter in matlab

I have a vector shapefile which is in unit of 'Meter' presenting boundary of overall Germany. I am converting it into raster format based on each pixel representing 300 Meters respectively. After conversion I accessed inmage information using imfinfo() in matlab. However the result is giving me the unit value is in "Inches" I am quite confused at the moment and do not know what to do to convert inches to meters as a pixel size unit. Would you please give me some idea?
`% Code
R6 = shaperead('B6c.shp');
%Nord
XN6 = double(R6(4).X); YN6 = double(R6(4).Y);
XN6min = min(XN6(XN6>0)); XNmax = max(XN6);
YN6min = min(YN6(YN6>0)); YNmax = max(YN6);
%Bayern
XB6 = double(R6(7).X); YB6 = double(R6(7).Y);
XB6min = min(XB6(XB6>0)); XB6max = max(XB6);
YB6min = min(YB6(YB6>0)); YB6max = max(YB6);
%Schleswig-Holstein
XSH6 = double(R6(9).X); YSH6 = double(R6(9).Y);
XSH6min = min(XSH6(XSH6>0)); XSH6max = max(XSH6);
YSH6min = min(YSH6(YSH6>0)); YSH6max = max(YSH6);
%Sachsen
XS6 = double(R6(6).X); YS6 = double(R6(6).Y);
XS6min = min(XS6(XS6>0)); XS6max = max(XS6);
YS6min = min(YS6(YS6>0)); YS6max = max(YS6);
dx = round(XS6max-XN6min);
dy = round(YSH6max-YB6min);
M = round((dx)/300);enter code here N = round((dy)/300);
A6 = zeros(M,N); %initiating image matrix based on 4 limiting States
%transformation from world to pixel coordinates
xpix_bw =(((XBW-XN6min)*M)/dx)';
ypix_bw =(((YBW-YB6min)*N)/dy)';
xbw6=round(xpix_bw); xbw6=xbw6(~isnan(xbw6));
ybw6=round(ypix_bw); ybw6=ybw6(~isnan(ybw6));
%line drawing
for i=1:1:length(xbw6)-1
j=i+1;
x1=xbw6(i); x2=xbw6(j); y1=ybw6(i); y2=ybw6(j);
nn=atan2((y2-y1),(x2-x1)); % azimuthal angle
if x2==x1
l=abs(y2-y1);
else
l = round((x2-x1)/cos(nn)); % horizontal distance
end
xx=zeros(l,1); %empty column
yy=zeros(l,1); %empty column
% creating line along slope distance
for i=1:1:l
xx(i)=round(x1+cos(nn)*i);
yy(i)=round(y1+sin(nn)*i);
A6(xx(i)+1,yy(i)+1) = 256;
end
end
imwrite(A6, 'Untitled_0506_300.tif','Resolution', 300);`

Can I plot a colorbar for a bokeh heatmap?

Does bokeh have a simple way to plot the colorbar for a heatmap?
In this example it would be a strip illustrating how colors correspond to values.
In matlab, its called a 'colorbar' and looks like this:
UPDATE: This is now much easier: see
http://docs.bokeh.org/en/latest/docs/user_guide/annotations.html#color-bars
I'm afraid I don't have a great answer, this should be easier in Bokeh. But I have done something like this manually before.
Because I often want these off my plot, I make a new plot, and then assemble it together with something like hplot or gridplot.
There is an example of this here: https://github.com/birdsarah/pycon_2015_bokeh_talk/blob/master/washmap/washmap/water_map.py#L179
In your case, the plot should be pretty straight forward. If you made a datasource like this:
| value | color
| 1 | blue
.....
| 9 | red
Then you could do something like:
legend = figure(tools=None)
legend.toolbar_location=None
legend.rect(x=0.5, y='value', fill_color='color', width=1, height=1, source=source)
layout = hplot(main, legend)
show(legend)
However, this does rely on you knowing the colors that your values correspond to. You can pass a palette to your heatmap chart call - as shown here: http://docs.bokeh.org/en/latest/docs/gallery/cat_heatmap_chart.html so then you would be able to use that to construct the new data source from that.
I'm pretty sure there's at least one open issue around color maps. I know I just added one for off-plot legends.
Since other answers here seem very complicated, here an easily understandable piece of code that generates a colorbar on a bokeh heatmap.
import numpy as np
from bokeh.plotting import figure, show
from bokeh.models import LinearColorMapper, BasicTicker, ColorBar
data = np.random.rand(10,10)
color_mapper = LinearColorMapper(palette="Viridis256", low=0, high=1)
plot = figure(x_range=(0,1), y_range=(0,1))
plot.image(image=[data], color_mapper=color_mapper,
dh=[1.0], dw=[1.0], x=[0], y=[0])
color_bar = ColorBar(color_mapper=color_mapper, ticker= BasicTicker(),
location=(0,0))
plot.add_layout(color_bar, 'right')
show(plot)
Since the 0.12.3 version Bokeh has the ColorBar.
This documentation was very useful to me:
http://docs.bokeh.org/en/dev/docs/user_guide/annotations.html#color-bars
To do this I did the same as #birdsarah. As an extra tip though if you use the rect method as your colour map, then use the rect method once again in the colour bar and use the same source. The end result is that you can select sections of the colour bar and it also selects in your plot.
Try it out:
http://simonbiggs.github.io/electronfactors
Here is some code loosely based on birdsarah's response for generating a colorbar:
def generate_colorbar(palette, low=0, high=15, plot_height = 100, plot_width = 500, orientation = 'h'):
y = np.linspace(low,high,len(palette))
dy = y[1]-y[0]
if orientation.lower()=='v':
fig = bp.figure(tools="", x_range = [0, 1], y_range = [low, high], plot_width = plot_width, plot_height=plot_height)
fig.toolbar_location=None
fig.xaxis.visible = None
fig.rect(x=0.5, y=y, color=palette, width=1, height = dy)
elif orientation.lower()=='h':
fig = bp.figure(tools="", y_range = [0, 1], x_range = [low, high],plot_width = plot_width, plot_height=plot_height)
fig.toolbar_location=None
fig.yaxis.visible = None
fig.rect(x=y, y=0.5, color=palette, width=dy, height = 1)
return fig
Also, if you are interested in emulating matplot lib colormaps, try using this:
import matplotlib as mpl
def return_bokeh_colormap(name):
cm = mpl.cm.get_cmap(name)
colormap = [rgb_to_hex(tuple((np.array(cm(x))*255).astype(np.int))) for x in range(0,cm.N)]
return colormap
def rgb_to_hex(rgb):
return '#%02x%02x%02x' % rgb[0:3]
This is high on my wish list as well. It would also need to automatically adjust the range if the plotted data changed (e.g. moving through one dimension of a 3D data set). The code below does something which people might find useful. The trick is to add an extra axis to the colourbar which you can control through a data source when the data changes.
import numpy
from bokeh.plotting import Figure
from bokeh.models import ColumnDataSource, Plot, LinearAxis
from bokeh.models.mappers import LinearColorMapper
from bokeh.models.ranges import Range1d
from bokeh.models.widgets import Slider
from bokeh.models.widgets.layouts import VBox
from bokeh.core.properties import Instance
from bokeh.palettes import RdYlBu11
from bokeh.io import curdoc
class Colourbar(VBox):
plot = Instance(Plot)
cbar = Instance(Plot)
power = Instance(Slider)
datasrc = Instance(ColumnDataSource)
cbarrange = Instance(ColumnDataSource)
cmap = Instance(LinearColorMapper)
def __init__(self):
self.__view_model__ = "VBox"
self.__subtype__ = "MyApp"
super(Colourbar,self).__init__()
numslices = 6
x = numpy.linspace(1,2,11)
y = numpy.linspace(2,4,21)
Z = numpy.ndarray([numslices,y.size,x.size])
for i in range(numslices):
for j in range(y.size):
for k in range(x.size):
Z[i,j,k] = (y[j]*x[k])**(i+1) + y[j]*x[k]
self.power = Slider(title = 'Power',name = 'Power',start = 1,end = numslices,step = 1,
value = round(numslices/2))
self.power.on_change('value',self.inputchange)
z = Z[self.power.value]
self.datasrc = ColumnDataSource(data={'x':x,'y':y,'z':[z],'Z':Z})
self.cmap = LinearColorMapper(palette = RdYlBu11)
r = Range1d(start = z.min(),end = z.max())
self.cbarrange = ColumnDataSource(data = {'range':[r]})
self.plot = Figure(title="Colourmap plot",x_axis_label = 'x',y_axis_label = 'y',
x_range = [x[0],x[-1]],y_range=[y[0],y[-1]],
plot_height = 500,plot_width = 500)
dx = x[1] - x[0]
dy = y[1] - y[0]
self.plot.image('z',source = self.datasrc,x = x[0]-dx/2, y = y[0]-dy/2,
dw = [x[-1]-x[0]+dx],dh = [y[-1]-y[0]+dy],
color_mapper = self.cmap)
self.generate_colorbar()
self.children.append(self.power)
self.children.append(self.plot)
self.children.append(self.cbar)
def generate_colorbar(self,cbarlength = 500,cbarwidth = 50):
pal = RdYlBu11
minVal = self.datasrc.data['z'][0].min()
maxVal = self.datasrc.data['z'][0].max()
vals = numpy.linspace(minVal,maxVal,len(pal))
self.cbar = Figure(tools = "",x_range = [minVal,maxVal],y_range = [0,1],
plot_width = cbarlength,plot_height = cbarwidth)
self.cbar.toolbar_location = None
self.cbar.min_border_left = 10
self.cbar.min_border_right = 10
self.cbar.min_border_top = 0
self.cbar.min_border_bottom = 0
self.cbar.xaxis.visible = None
self.cbar.yaxis.visible = None
self.cbar.extra_x_ranges = {'xrange':self.cbarrange.data['range'][0]}
self.cbar.add_layout(LinearAxis(x_range_name = 'xrange'),'below')
for r in self.cbar.renderers:
if type(r).__name__ == 'Grid':
r.grid_line_color = None
self.cbar.rect(x = vals,y = 0.5,color = pal,width = vals[1]-vals[0],height = 1)
def updatez(self):
data = self.datasrc.data
newdata = data
z = data['z']
z[0] = data['Z'][self.power.value - 1]
newdata['z'] = z
self.datasrc.trigger('data',data,newdata)
def updatecbar(self):
minVal = self.datasrc.data['z'][0].min()
maxVal = self.datasrc.data['z'][0].max()
self.cbarrange.data['range'][0].start = minVal
self.cbarrange.data['range'][0].end = maxVal
def inputchange(self,attrname,old,new):
self.updatez()
self.updatecbar()
curdoc().add_root(Colourbar())

Projection of circular region of interest onto rectangle [duplicate]

BOUNTY STATUS UPDATE:
I discovered how to map a linear lens, from destination coordinates to source coordinates.
How do you calculate the radial distance from the centre to go from fisheye to rectilinear?
1). I actually struggle to reverse it, and to map source coordinates to destination coordinates. What is the inverse, in code in the style of the converting functions I posted?
2). I also see that my undistortion is imperfect on some lenses - presumably those that are not strictly linear. What is the equivalent to-and-from source-and-destination coordinates for those lenses? Again, more code than just mathematical formulae please...
Question as originally stated:
I have some points that describe positions in a picture taken with a fisheye lens.
I want to convert these points to rectilinear coordinates. I want to undistort the image.
I've found this description of how to generate a fisheye effect, but not how to reverse it.
There's also a blog post that describes how to use tools to do it; these pictures are from that:
(1) : SOURCE Original photo link
Input : Original image with fish-eye distortion to fix.
(2) : DESTINATION Original photo link
Output : Corrected image (technically also with perspective correction, but that's a separate step).
How do you calculate the radial distance from the centre to go from fisheye to rectilinear?
My function stub looks like this:
Point correct_fisheye(const Point& p,const Size& img) {
// to polar
const Point centre = {img.width/2,img.height/2};
const Point rel = {p.x-centre.x,p.y-centre.y};
const double theta = atan2(rel.y,rel.x);
double R = sqrt((rel.x*rel.x)+(rel.y*rel.y));
// fisheye undistortion in here please
//... change R ...
// back to rectangular
const Point ret = Point(centre.x+R*cos(theta),centre.y+R*sin(theta));
fprintf(stderr,"(%d,%d) in (%d,%d) = %f,%f = (%d,%d)\n",p.x,p.y,img.width,img.height,theta,R,ret.x,ret.y);
return ret;
}
Alternatively, I could somehow convert the image from fisheye to rectilinear before finding the points, but I'm completely befuddled by the OpenCV documentation. Is there a straightforward way to do it in OpenCV, and does it perform well enough to do it to a live video feed?
The description you mention states that the projection by a pin-hole camera (one that does not introduce lens distortion) is modeled by
R_u = f*tan(theta)
and the projection by common fisheye lens cameras (that is, distorted) is modeled by
R_d = 2*f*sin(theta/2)
You already know R_d and theta and if you knew the camera's focal length (represented by f) then correcting the image would amount to computing R_u in terms of R_d and theta. In other words,
R_u = f*tan(2*asin(R_d/(2*f)))
is the formula you're looking for. Estimating the focal length f can be solved by calibrating the camera or other means such as letting the user provide feedback on how well the image is corrected or using knowledge from the original scene.
In order to solve the same problem using OpenCV, you would have to obtain the camera's intrinsic parameters and lens distortion coefficients. See, for example, Chapter 11 of Learning OpenCV (don't forget to check the correction). Then you can use a program such as this one (written with the Python bindings for OpenCV) in order to reverse lens distortion:
#!/usr/bin/python
# ./undistort 0_0000.jpg 1367.451167 1367.451167 0 0 -0.246065 0.193617 -0.002004 -0.002056
import sys
import cv
def main(argv):
if len(argv) < 10:
print 'Usage: %s input-file fx fy cx cy k1 k2 p1 p2 output-file' % argv[0]
sys.exit(-1)
src = argv[1]
fx, fy, cx, cy, k1, k2, p1, p2, output = argv[2:]
intrinsics = cv.CreateMat(3, 3, cv.CV_64FC1)
cv.Zero(intrinsics)
intrinsics[0, 0] = float(fx)
intrinsics[1, 1] = float(fy)
intrinsics[2, 2] = 1.0
intrinsics[0, 2] = float(cx)
intrinsics[1, 2] = float(cy)
dist_coeffs = cv.CreateMat(1, 4, cv.CV_64FC1)
cv.Zero(dist_coeffs)
dist_coeffs[0, 0] = float(k1)
dist_coeffs[0, 1] = float(k2)
dist_coeffs[0, 2] = float(p1)
dist_coeffs[0, 3] = float(p2)
src = cv.LoadImage(src)
dst = cv.CreateImage(cv.GetSize(src), src.depth, src.nChannels)
mapx = cv.CreateImage(cv.GetSize(src), cv.IPL_DEPTH_32F, 1)
mapy = cv.CreateImage(cv.GetSize(src), cv.IPL_DEPTH_32F, 1)
cv.InitUndistortMap(intrinsics, dist_coeffs, mapx, mapy)
cv.Remap(src, dst, mapx, mapy, cv.CV_INTER_LINEAR + cv.CV_WARP_FILL_OUTLIERS, cv.ScalarAll(0))
# cv.Undistort2(src, dst, intrinsics, dist_coeffs)
cv.SaveImage(output, dst)
if __name__ == '__main__':
main(sys.argv)
Also note that OpenCV uses a very different lens distortion model to the one in the web page you linked to.
(Original poster, providing an alternative)
The following function maps destination (rectilinear) coordinates to source (fisheye-distorted) coordinates. (I'd appreciate help in reversing it)
I got to this point through trial-and-error: I don't fundamentally grasp why this code is working, explanations and improved accuracy appreciated!
def dist(x,y):
return sqrt(x*x+y*y)
def correct_fisheye(src_size,dest_size,dx,dy,factor):
""" returns a tuple of source coordinates (sx,sy)
(note: values can be out of range)"""
# convert dx,dy to relative coordinates
rx, ry = dx-(dest_size[0]/2), dy-(dest_size[1]/2)
# calc theta
r = dist(rx,ry)/(dist(src_size[0],src_size[1])/factor)
if 0==r:
theta = 1.0
else:
theta = atan(r)/r
# back to absolute coordinates
sx, sy = (src_size[0]/2)+theta*rx, (src_size[1]/2)+theta*ry
# done
return (int(round(sx)),int(round(sy)))
When used with a factor of 3.0, it successfully undistorts the images used as examples (I made no attempt at quality interpolation):
Dead link
(And this is from the blog post, for comparison:)
If you think your formulas are exact, you can comput an exact formula with trig, like so:
Rin = 2 f sin(w/2) -> sin(w/2)= Rin/2f
Rout= f tan(w) -> tan(w)= Rout/f
(Rin/2f)^2 = [sin(w/2)]^2 = (1 - cos(w))/2 -> cos(w) = 1 - 2(Rin/2f)^2
(Rout/f)^2 = [tan(w)]^2 = 1/[cos(w)]^2 - 1
-> (Rout/f)^2 = 1/(1-2[Rin/2f]^2)^2 - 1
However, as #jmbr says, the actual camera distortion will depend on the lens and the zoom. Rather than rely on a fixed formula, you might want to try a polynomial expansion:
Rout = Rin*(1 + A*Rin^2 + B*Rin^4 + ...)
By tweaking first A, then higher-order coefficients, you can compute any reasonable local function (the form of the expansion takes advantage of the symmetry of the problem). In particular, it should be possible to compute initial coefficients to approximate the theoretical function above.
Also, for good results, you will need to use an interpolation filter to generate your corrected image. As long as the distortion is not too great, you can use the kind of filter you would use to rescale the image linearly without much problem.
Edit: as per your request, the equivalent scaling factor for the above formula:
(Rout/f)^2 = 1/(1-2[Rin/2f]^2)^2 - 1
-> Rout/f = [Rin/f] * sqrt(1-[Rin/f]^2/4)/(1-[Rin/f]^2/2)
If you plot the above formula alongside tan(Rin/f), you can see that they are very similar in shape. Basically, distortion from the tangent becomes severe before sin(w) becomes much different from w.
The inverse formula should be something like:
Rin/f = [Rout/f] / sqrt( sqrt(([Rout/f]^2+1) * (sqrt([Rout/f]^2+1) + 1) / 2 )
I blindly implemented the formulas from here, so I cannot guarantee it would do what you need.
Use auto_zoom to get the value for the zoom parameter.
def dist(x,y):
return sqrt(x*x+y*y)
def fisheye_to_rectilinear(src_size,dest_size,sx,sy,crop_factor,zoom):
""" returns a tuple of dest coordinates (dx,dy)
(note: values can be out of range)
crop_factor is ratio of sphere diameter to diagonal of the source image"""
# convert sx,sy to relative coordinates
rx, ry = sx-(src_size[0]/2), sy-(src_size[1]/2)
r = dist(rx,ry)
# focal distance = radius of the sphere
pi = 3.1415926535
f = dist(src_size[0],src_size[1])*factor/pi
# calc theta 1) linear mapping (older Nikon)
theta = r / f
# calc theta 2) nonlinear mapping
# theta = asin ( r / ( 2 * f ) ) * 2
# calc new radius
nr = tan(theta) * zoom
# back to absolute coordinates
dx, dy = (dest_size[0]/2)+rx/r*nr, (dest_size[1]/2)+ry/r*nr
# done
return (int(round(dx)),int(round(dy)))
def fisheye_auto_zoom(src_size,dest_size,crop_factor):
""" calculate zoom such that left edge of source image matches left edge of dest image """
# Try to see what happens with zoom=1
dx, dy = fisheye_to_rectilinear(src_size, dest_size, 0, src_size[1]/2, crop_factor, 1)
# Calculate zoom so the result is what we wanted
obtained_r = dest_size[0]/2 - dx
required_r = dest_size[0]/2
zoom = required_r / obtained_r
return zoom
I took what JMBR did and basically reversed it. He took the radius of the distorted image (Rd, that is, the distance in pixels from the center of the image) and found a formula for Ru, the radius of the undistorted image.
You want to go the other way. For each pixel in the undistorted (processed image), you want to know what the corresponding pixel is in the distorted image.
In other words, given (xu, yu) --> (xd, yd). You then replace each pixel in the undistorted image with its corresponding pixel from the distorted image.
Starting where JMBR did, I do the reverse, finding Rd as a function of Ru. I get:
Rd = f * sqrt(2) * sqrt( 1 - 1/sqrt(r^2 +1))
where f is the focal length in pixels (I'll explain later), and r = Ru/f.
The focal length for my camera was 2.5 mm. The size of each pixel on my CCD was 6 um square. f was therefore 2500/6 = 417 pixels. This can be found by trial and error.
Finding Rd allows you to find the corresponding pixel in the distorted image using polar coordinates.
The angle of each pixel from the center point is the same:
theta = arctan( (yu-yc)/(xu-xc) ) where xc, yc are the center points.
Then,
xd = Rd * cos(theta) + xc
yd = Rd * sin(theta) + yc
Make sure you know which quadrant you are in.
Here is the C# code I used
public class Analyzer
{
private ArrayList mFisheyeCorrect;
private int mFELimit = 1500;
private double mScaleFESize = 0.9;
public Analyzer()
{
//A lookup table so we don't have to calculate Rdistorted over and over
//The values will be multiplied by focal length in pixels to
//get the Rdistorted
mFisheyeCorrect = new ArrayList(mFELimit);
//i corresponds to Rundist/focalLengthInPixels * 1000 (to get integers)
for (int i = 0; i < mFELimit; i++)
{
double result = Math.Sqrt(1 - 1 / Math.Sqrt(1.0 + (double)i * i / 1000000.0)) * 1.4142136;
mFisheyeCorrect.Add(result);
}
}
public Bitmap RemoveFisheye(ref Bitmap aImage, double aFocalLinPixels)
{
Bitmap correctedImage = new Bitmap(aImage.Width, aImage.Height);
//The center points of the image
double xc = aImage.Width / 2.0;
double yc = aImage.Height / 2.0;
Boolean xpos, ypos;
//Move through the pixels in the corrected image;
//set to corresponding pixels in distorted image
for (int i = 0; i < correctedImage.Width; i++)
{
for (int j = 0; j < correctedImage.Height; j++)
{
//which quadrant are we in?
xpos = i > xc;
ypos = j > yc;
//Find the distance from the center
double xdif = i-xc;
double ydif = j-yc;
//The distance squared
double Rusquare = xdif * xdif + ydif * ydif;
//the angle from the center
double theta = Math.Atan2(ydif, xdif);
//find index for lookup table
int index = (int)(Math.Sqrt(Rusquare) / aFocalLinPixels * 1000);
if (index >= mFELimit) index = mFELimit - 1;
//calculated Rdistorted
double Rd = aFocalLinPixels * (double)mFisheyeCorrect[index]
/mScaleFESize;
//calculate x and y distances
double xdelta = Math.Abs(Rd*Math.Cos(theta));
double ydelta = Math.Abs(Rd * Math.Sin(theta));
//convert to pixel coordinates
int xd = (int)(xc + (xpos ? xdelta : -xdelta));
int yd = (int)(yc + (ypos ? ydelta : -ydelta));
xd = Math.Max(0, Math.Min(xd, aImage.Width-1));
yd = Math.Max(0, Math.Min(yd, aImage.Height-1));
//set the corrected pixel value from the distorted image
correctedImage.SetPixel(i, j, aImage.GetPixel(xd, yd));
}
}
return correctedImage;
}
}
I found this pdf file and I have proved that the maths are correct (except for the line vd = *xd**fv+v0 which should say vd = **yd**+fv+v0).
http://perception.inrialpes.fr/CAVA_Dataset/Site/files/Calibration_OpenCV.pdf
It does not use all of the latest co-efficients that OpenCV has available but I am sure that it could be adapted fairly easily.
double k1 = cameraIntrinsic.distortion[0];
double k2 = cameraIntrinsic.distortion[1];
double p1 = cameraIntrinsic.distortion[2];
double p2 = cameraIntrinsic.distortion[3];
double k3 = cameraIntrinsic.distortion[4];
double fu = cameraIntrinsic.focalLength[0];
double fv = cameraIntrinsic.focalLength[1];
double u0 = cameraIntrinsic.principalPoint[0];
double v0 = cameraIntrinsic.principalPoint[1];
double u, v;
u = thisPoint->x; // the undistorted point
v = thisPoint->y;
double x = ( u - u0 )/fu;
double y = ( v - v0 )/fv;
double r2 = (x*x) + (y*y);
double r4 = r2*r2;
double cDist = 1 + (k1*r2) + (k2*r4);
double xr = x*cDist;
double yr = y*cDist;
double a1 = 2*x*y;
double a2 = r2 + (2*(x*x));
double a3 = r2 + (2*(y*y));
double dx = (a1*p1) + (a2*p2);
double dy = (a3*p1) + (a1*p2);
double xd = xr + dx;
double yd = yr + dy;
double ud = (xd*fu) + u0;
double vd = (yd*fv) + v0;
thisPoint->x = ud; // the distorted point
thisPoint->y = vd;
This can be solved as an optimization problem. Simply draw on curves in images that are supposed to be straight lines. Store the contour points for each of those curves. Now we can solve the fish eye matrix as a minimization problem. Minimize the curve in points and that will give us a fisheye matrix. It works.
It can be done manually by adjusting the fish eye matrix using trackbars! Here is a fish eye GUI code using OpenCV for manual calibration.