Prediction avoiding landmass - postgresql

I am working on a project where the following functions has to be implemented.
Predict the location of the ships (in maritime environment) into a future time (Can be done with Kalman filter, IMM filter and some other algorithms).
Ships can be any part of the world.
Avoiding landmass during prediction
Shortest path along the shorelines
I am totally done with the first part which is predicting without considering the shoreline information. I have
problem with the functions 2 and 3.
Problem in function 2
At times, your predicted location can fall into the landmass area which is totally unacceptable.
I am using following coastal area shp file http://openstreetmapdata.com/data/coastlines
This file has converted XY values of the world shoreline data.
I have loaded this shp file into postgreSQL and used postgis to read it from the database.
So my idea is to go through all the polygons (shoreline defined based on polygons) and checking whether the line connecting the present location and the predicted location
crosses the polygon. If it crosses, that means we have to find the where the ship intercept the shoreline first.
So if I follow this approach going through all the polygons, it is going to take time forever. (It has around 62000 polygons with each of them has 1000's of
points). So any advice on this? I thought about initially dividing the worldmap into hierachical areas (Level 1 : 10 polygons, Level 2: Each polygon has 10 polygons inside).
But I am not sure how to divide the world map with the above shp file into the level of polygons I require.
Or any functionality of postgis helpful for this? or any other libraries for this purpose. I believe this kind of functionality should be available already. But I could not
able to figure it out sofar.
Function 3
Since now we know where does the ship intercept the shoreline first, we can predict it along the shoreline using the shortest path algorithm given we know
the destination information. But to do this, you need to divide the above shoreline map into grids so the shortest path can be used.
So how can you make grids based on this along the shorelines? I am not doing image processing here. What I have is this shp file now. Any advice is appreciated.
or should I go with some image processing approach and make the grid shorelines. if so please provide some links.

First, PostGIS is pretty fast, and with the proper indexes, as long as you keep your polygons reasonably small, you should be able to make up for the number of them with good indexing and overlapping operator support (overlapping polygons can use GIST and GIN indexes, with the latter performing better than the former for reads and worse for writes).
62000 polygons globally is nothing. Write back when you are having to check more than a few thousand whose bounding boxes overlap with your line....
For the third problem, you are going one direction, right? I am wondering how hard it would be to write a tangent(point, vector, polygon) function which would return the closest tangent to a polygon along a certain vector (a vector could be represented by a (point, point) tuple). If you were to combine this with KNN searches, you ought to be able to plot a course using a WITH RECURSIVE query.

Related

Matlab calculate geographical distance to lat/lng polyline

In Matlab I would like to calculate the (shortest) distances between a set of independent points (m-by-2 matrix of lat/lng) and a set of polylines (n-by-2 matrix of lat/lng). The resulting table should be an n-m matrix with distances in KM.
I have rewritten this JavaScript implementation (http://www.bdcc.co.uk/Gmaps/BdccGeo.js) to Matlab, but it does not seem to perform well.
Currently I am working on a project with a relatively large set of data and running into performance issues. I have roughly 40.000 points and 150 polylines. The polylines are subsets of the original set of 40.000 points. With about 15 seconds per polyline, calculating all these distances can take up to an hour. Also, the intermediate matrixes of 40000x150x3 cause out of memory errors on my lesser machines.
Instead of optimizing or revising this implementation I am wondering if Matlab doesn't already have some (smarter) functions built in for this. But as far as I can see, the documentation mainly has information on how to display geodata as opposed to doing calculations on it.
Does anyone know or have experience with these kind of calculations in Matlab? Has anything like this already been written which I can reuse so I don't have to reinvent the wheel. And finally, is this expected performance, given these numbers, or should my function be able to perform much better?

Geometry from 2D point cloud with MATLAB

I have an unorganized point cloud and I want to compare ideal CAD geometry and profile measurement of an object. For example, I have a CAD data of ideal object and I have a point cloud like this;
How can I compare these two data? I know from CAD file, is point on the CAD data belongs to line or radius(Arc), but how can I derive radius error of an Arc or length error of a Line?
I tried to organize data with knnsearch but results are not satisfying.So, I tried to draw a line starting from a one point ( lets Say point 1) and I want to go next closest point ( Lets say Point 2). If closest neighbour of Point 2 is Point 1, then go to second closest point of Point 2. That algorithm seems to good for me but results are not satisfying also. Connection lines went one edge to other.
I also thought that, may be I should convert CAD data to point cloud and I have to compare each point of measurement with closest point on CAD point cloud. I know which points belong to line and which point belong the Arc and I can calculate mean error from a line or Arc. But end points of lines or arcs will be trouble I think. Comparison results at these points will have large error I think.
On the other hand, CAD geometry and measurements will not be convex and perfectly covered always. Some non-convex geometries can be measured. For example, you can see measurements of inverted V shape with lack of some points. It is the worst case;
If there are some errors on geometry estimation when measurements are not enough, it is acceptable for me.
CPU load is also important criteria for me. There are 10.000 points and I want to complete filterings and geometry matchings in 20 ms with i7 processors.
Are there any robust solution for this aim?
Ok, I'm answering my question. Matlab has built-in functions for computational geometry.
Boundary function of this module has solved my problem partially. I can use it for non-convex geometries. For convex geometries, I'm calculating center point of geometry by simple avaraging of points. Then, I'm sorting all points by atan2 function.
But I can't figure out how can I find geometries. One way is using CAD data and iterate described geometries in CAD data to minimize least square error between point cloud. Other way is, creating arc and lines from points directly. I dont which way will be faster and more robust.

Clusters based on distance

Here is my problem: I have a list of villages. For each village I computed the path distance between them and prepared a distance matrix. Now I want to identify clusters of villages which are close to each other.
I use Python 2.7 and I already used hierarchical clustering (provided by scypy) to cluster the distance matrix. By looking at it as a human being, I can identify the nearest villages, but I need to automate it. I need to get the elements which belong to each cluster.
I was also wondering how to retrieve the clusters once I had created and cut the dendrogram. Since this is unanswered and may come up for others with a similar question, I'll answer according to what I was looking for, making some assumptions since this is an old question.
The first step is that you need to determine where to cut the dendrogram. You can do this a variety of ways, but I'll assume you already know how to do this, since you're looking at the dendrogram and seem to have satisfied yourself that you have clustered the data. If you don't know where to cut, you could start with something simple like cutting at the max distance. But really, where to cut is a different, very long discussion which I will assume you have figured out how to do (since I had done so at this point in my search).
Now I assume you have a dendrogram, and you know where to cut it, and maybe you even have it plotted with the cut line. But you want to do something more with the clusters, so you need to label the points you clustered. This can be done using the flat cluster (fcluster()) function in scipy.
from scipy.cluster.hierarchy import fcluster
clusters=fcluster(Z,distance,criterion='distance')
print(clusters)
Z is the hierarchical linkage matrix (as from scipy's linkage() function) which I assume you had already created. distance is the distance at which you are cutting the dendrogram (but there are other ways to cut the dendrogram, see source for how to do this with fcluster).
This returns a numpy array denoting which observation is in which cluster. Now you can append this to your data as a new column and go to town (or village) with it.

Writing ELKI DBSCAN convex hull of clusters to file

I have started using ELKI for data analysis, but one seemingly simple thing I cannot seem to do is output the calculated convex hull of clusters to a file after running DBSCAN. I am able to visualize the convex hulls via the visualization gui, but I cannot generate the KML file. I am also able to write my clustering results to a folder (using the ResultWriter resulthandler), but no file is generated when I set the KMLOutputHandler. I receive no error message in the log window (even with verbose parameter set to true).
Is there a trick to generating a KML file in ELKI? Could anyone walk through the steps of doing this?
Any help would be appreciated.
(as an aside, is it possible to generate alpha shapes for DBSCAN results with ELKI? If so, which parameter must be adjusted?)
So that is actually a lot of questions in one...
Cunvex hulls: they are used in ELKI for visualization, but not considered part of the output result, so they are not saved to file. A trick you could employ is to save the visualization as SVG and extract them from this file, but they will then be in a different coordinate system.
One of the reasons for this is that the convex hulls are only implemented for 2D Euclidean space - I figure you want to use it for spatial data, where it may actually happen to not return the correct convex hull then due to the curvature of the earth surface. Furthermore, many data sets will be of higher dimensionality.
However, you can of course look at the source code and invoke the convex hull algorithm, then write the result to your favorite output format. In general, just as you will need to spend time on preprocessing, you will also need to customize the output.
Which brings me to the second question. The KMLResultHandler is closely tied to the publication of ELKI 0.4.0: Spatial Outlier Detection: Data, Algorithms, Visualizations.
Which pretty much summarizes what this class does: visualize spatial outlier detection. It currently does not (yet) include code to visualize clusters of spatial data, for example. In order to get an output from the class, you need to ensure a number of restrictions, unfortunately. Essentially, if it finds a Polygon relation and a OutlierResult that it can map to each other, it will output this to KML.
It is not yet a class that could write arbitrary results to KML. It probably needs a lot more of documentation, too. Contributions of a more general output tool would be appreciated; but a customizeable, automatic, general output to KML is really hard to do. In particular, you may also end up having to include projection capabilities then, if someone is not processing Latitude-Longitude data, but e.g. UTM projected data.
As such, I recommend looking at the source code of the class and customizing it to your needs. In my opinion, visualization to KML will always require a lot of customization.
To generate alpha shapes (only the hull, not the extended alpha shape - the optimal visualization of DBSCAN would likely consist of the alpha shape of the core points only, extended by a radius of epsilon, which should then include the border points. This is on the wish list, but not implemented), you just need to set the -hull.alpha parameter to the desired alpha value. Note that this happens in the visualization projection, not at the raw data. If the axes are scaled differently, alpha shapes will look differently. Again, you may be interested in using the class AlphaShape on the raw data vectors, instead of exporting the projected visualization. Then you can easily write the resulting Polygons to your custom visualization.
If you implement such a KML visualization using alpha shapes (or convex hulls) for clusters, I would appreciate if you could contribute this to ELKI to make it available for others as well. Thank you.

Dectecting stamp (seals) imprints on digital image with SIFT

I am working on an application that should determine if input image contain a stamp imprint and return its location. For RGB images I am using color segmentation and doing verification (with various shape factors), for grayscale image I thought that SIFT + verification would do the job, but using SIFT would only find those stamps(on input image) that I got in my database.
In ideal case it works really well, as shown on image bellow.
Fig. 1.
http://i.stack.imgur.com/JHkUl.png
The problem occurs when input image contains a stamp that does not exist in database. First thing I did was checking if there would be any matching key points if I compare a similar stamp to the one on input image. In most cases there is no single matching key point and if there is some they rather refer to other parts of input image than a stamp, as shown in Fig. 2.:
Fig. 2.
http://i.stack.imgur.com/coA4l.png
I also tried to find a match between input and circle images as the stamps are circular, but circle image has very few key points, if any.
So I wonder if there is any different approach that will make SIFT a bit more useful in this exact case? I though about creating a matrix with all descriptors and key-points from my database and then looking for nearest euclidean distance between input image and matrix, but it probably wont work as there is a lot of matching key-points(unwanted) across the database (see Fig. 2.).
I'm working with Matlab and tried both VLFeat and D. Lowe SIFT implementations.
Edit:
So I found a way to force SIFT to compute descriptors for user defined points on an image. My test image contained a circle, then the descriptors were computed and matched against input images, including the one under Fig 1 and 2. This process was repeated for scales from 0 to 10. Unfortunately it didn't help too.
This is only a first hint and not a full answer to the SIFT questions.
My impression is that detecting a circle by matching it against an image of a circle via SIFT is not the best approach, especially if the circle you want to detect has some unknown texture inside.
The textbook algorithm for circle detection would be Hough transform, which is mostly used for line detection but does work for any kind of shape which can be described by a low number of parameters (colleagues tell me things get nasty above 3, but a circle just has X,Y and r). There are several implementations in file exchange, the link is just to one example. Hough circle detection requires you to put an upper bound on the radii you want to detect, but this seems ok for your application.
From the examples you provided it looks like you should get quite far if you can detect circles reliably.
Actually I do not think SIFT will be solving this problem. I've been playing around with SIFT for quite some time and my conclusion is that it's really great for identifying identical patterns but not for similar patterns.
Just have a look at the construction of the SIFT feature vector: The descriptor is composed of several histograms of gradients(!). If you have patterns in the database that have very similar blob like structures in the stamps, then you might have a chance. But if this does not hold, then I guess you will not be very lucky.
From my point of view you have kind of solved the problem of finding indentical objects (stamps) and now extend to finding similar objects. This sounds like the same but in my past research I found these problems just related but not too identical.
Do you have any runtime constraints in your application? There might be other approaches but in this case, more input about possible constraints might be useful.
Update regarding constraints:
So your next task might be to detect the unknown stamps, right?
This sounds like a classification task.
In your case I would first try to find a descriptor/representation (or SVM) that classifies images into stamp/no-stamp. In order to evaluate this, set up a data base with ground truth and a reasonable amount of "unknown" stamps and other images like random snapshots from the letters, NOT containing stamps. This will be your test set.
Then try some descriptors/representations to caluclate the distance/similarity between your images to classify your test set into the classes STAMP / NO-STAMP. When you have found a descriptor/distance measure (or SVM) that performs well in classifying, then you could perform a sliding window approach on a letter to find a stamp. The sliding window approach is certainly not a very fast method, but a very easy one.
At least when you have reached this point, you can tune the detection - for example based on interesting point detectors.. but one step after the other...