Insert a circle into geometry data type - tsql

I'm about to start using geometry or geography data types for the first time now that we have a development baseline of 2008R2 (!)
I'm struggling to find how to store the representation for a circle. We currently have the lat and long of the centre of the circle along with the radius, something like :-
[Lat] [float] NOT NULL,
[Long] [float] NOT NULL,
[Radius] [decimal](9, 4) NOT NULL,
Does anyone know the equivalent way to store this using the STGeomFromText method, ie which Well-Known Text (WKT) representation to use? I've looked at circular string (LINESTRING) and curve, but can't find any examples....
Thanks.

One thing you can do if you are using SQL Server 2008 is to buffer a point and store the resulting Polygon (as well-known binary, internally). For example,
declare #g geometry
set #g=geometry::STGeomFromText('POINT(0 0)', 4326).STBuffer(1)
select #g.ToString()
select #g.STNumPoints()
select #g.STArea()
This outputs, the WKT,
POLYGON ((0 -1, 0.051459848880767822 -0.99869883060455322, 0.10224419832229614 -0.99483710527420044, 0.15229016542434692 -0.98847776651382446, 0.20153486728668213 -0.97968357801437378, 0.24991559982299805 -0.96851736307144165,... , 0 -1))
the number of points, 129, from which it can be seen that buffering a circle uses 128 points plus a repeated start point and and the area, 3.1412, which is accurate to 3 decimal places, and differs from the real value by 0.01%, which would be acceptable for many use cases.
If you want less accuracy (ie, less points), you can use the Reduce function to decrease the number of points, eg,
declare #g geometry
set #g=geometry::STGeomFromText('POINT(0 0)', 4326).STBuffer(1).Reduce(0.01)
which now produces a circle approximation with 33 points and an area of 3.122 (now 0.6% less than the real value of PI).
Less points will reduce storage and make queries such as STIntersects and STIntersection faster, but, obviously, at the cost of accuracy.
EDIT 1: As Jon Bellamy has pointed out, if you choose to use the Reduce function, the parameter needs to be scaled proportionally to the circle/buffer radius, as it is a sensitivity factor for removing points, based on the Ramer-Douglas-Peucker algorithm
EDIT 2: There is also a function, BufferWithTolerance, which can be used to approximate a circle with a polygon. The second parameter, tolerance effects how close this approximation will be: the lower the value, the more points and better approximation. The 3rd parameter is a bit, indicating whether the tolerance is relative or absolute in relation to the buffer radius. This function could be used instead of the STBuffer, Reduce combination to create a circle with more points.
The following query produces,
declare #g geometry
set #g=geometry::STGeomFromText('POINT(0 0)', 4326).BufferWithTolerance(1,0.0001,1)
select #g.STNumPoints()
select #g.STArea()
a "circle" of 321 points with an area of 3.1424, ie, within 0.02% of the true value of PI (but now larger) and actually less accurate than the simple buffer above. Increasing the tolerance further does not lead to any significant improvement in accuracy, which suggests there is an upper limit to this approach.
As MvG has said, there is no CircularString or CompoundCurve until SQL Server 2012, which would allow you to store circles more compactly and accurately, by building a CompoundCurve made up of two semi-circles, ie, using two CircularStrings.

As far as I can tell from the docs, CircularString was only added for SQL Server 2012. The only other instantiable curve appears to be LineString which, as the name suggests, encodes a sequence of line segments. So your best bet would be approximating the circle as a (possibly regular) polygon with a sufficient number of corners. If that is not acceptable, you might have to keep your current data structures in place, either exclusively or in addition to spatial data types to verify that a match there indeed matches the circle.
This answer was written purely from the docs, with no experience to support it.

Related

Determine in which polygons a point is

I have tremendous flows of point data (in 2D) (thousands every second). On this map I have several fixed polygons (dozens to a few hundreds of them).
I would like to determine in real time (the order of a few milliseconds on a rather powerful laptop) for each point in which polygons it lies (polygons can intersect).
I thought I'd use the ray casting algorithm.
Nevertheless, I need a way to preprocess the data, to avoid scanning every polygon.
I therefore consider using tree approaches (PM quadtree or Rtree ?). Is there any other relevant method ?
Is there a good PM Quadtree implementation you would recommend (in whatever language, preferably C(++), Java or Python) ?
I have developed a library of several multi-dimensional indexes in Java, it can be found here. It contains R*Tree, STR-Tree, 4 quadtrees (2 for points, 2 for rectangles) and a critbit tree (can be used for spatial data by interleaving the coordinates). I also developed the PH-Tree.
There are all rectange/point based trees, so you would have to convert your polygons into rectangles, for example by calculating the bounding box. For all returned bounding boxes you would have to calculate manually if the polygon really intersects with your point.
If your rectangles are not too elongated, this should still be efficient.
I usually find the PH-Tree the most efficient tree, it has fast building times and very fast query times if a point intersects with 100 rectangles or less (even better with 10 or less). STR/R*-trees are better with larger overlap sizes (1000+). The quadtrees are a bit unreliable, they have problems with numeric precision when inserting millions of elements.
Assuming a 3D tree with 1 million rectangles and on average one result per query, the PH-Tree requires about 3 microseconds per query on my desktop (i7 4xxx), i.e. 300 queries per millisecond.

MATLAB: Using CONVN for moving average on Matrix

I'm looking for a bit of guidance on using CONVN to calculate moving averages in one dimension on a 3d matrix. I'm getting a little caught up on the flipping of the kernel under the hood and am hoping someone might be able to clarify the behaviour for me.
A similar post that still has me a bit confused is here:
CONVN example about flipping
The Problem:
I have daily river and weather flow data for a watershed at different source locations.
So the matrix is as so,
dim 1 (the rows) represent each site
dim 2 (the columns) represent the date
dim 3 (the pages) represent the different type of measurement (river height, flow, rainfall, etc.)
The goal is to try and use CONVN to take a 21 day moving average at each site, for each observation point for each variable.
As I understand it, I should just be able to use a a kernel such as:
ker = ones(1,21) ./ 21.;
mat = randn(150,365*10,4);
avgmat = convn(mat,ker,'valid');
I tried playing around and created another kernel which should also work (I think) and set ker2 as:
ker2 = [zeros(1,21); ker; zeros(1,21)];
avgmat2 = convn(mat,ker2,'valid');
The question:
The results don't quite match and I'm wondering if I have the dimensions incorrect here for the kernel. Any guidance is greatly appreciated.
Judging from the context of your question, you have a 3D matrix and you want to find the moving average of each row independently over all 3D slices. The code above should work (the first case). However, the valid flag returns a matrix whose size is valid in terms of the boundaries of the convolution. Take a look at the first point of the post that you linked to for more details.
Specifically, the first 21 entries for each row will be missing due to the valid flag. It's only when you get to the 22nd entry of each row does the convolution kernel become completely contained inside a row of the matrix and it's from that point where you get valid results (no pun intended). If you'd like to see these entries at the boundaries, then you'll need to use the 'same' flag if you want to maintain the same size matrix as the input or the 'full' flag (which is default) which gives you the size of the output starting from the most extreme outer edges, but bear in mind that the moving average will be done with a bunch of zeroes and so the first 21 entries wouldn't be what you expect anyway.
However, if I'm interpreting what you are asking, then the valid flag is what you want, but bear in mind that you will have 21 entries missing to accommodate for the edge cases. All in all, your code should work, but be careful on how you interpret the results.
BTW, you have a symmetric kernel, and so flipping should have no effect on the convolution output. What you have specified is a standard moving averaging kernel, and so convolution should work in finding the moving average as you expect.
Good luck!

How to use Morton Order(z order curve) in range search?

How to use Morton Order in range search?
From the wiki, In the paragraph "Use with one-dimensional data structures for range searching",
it says
"the range being queried (x = 2, ..., 3, y = 2, ..., 6) is indicated
by the dotted rectangle. Its highest Z-value (MAX) is 45. In this
example, the value F = 19 is encountered when searching a data
structure in increasing Z-value direction. ......BIGMIN (36 in the
example).....only search in the interval between BIGMIN and MAX...."
My questions are:
1) why the F is 19? Why the F should not be 16?
2) How to get the BIGMIN?
3) Are there any web blogs demonstrate how to do the range search?
EDIT: The AWS Database Blog now has a detailed introduction to this subject.
This blog post does a reasonable job of illustrating the process.
When searching the rectangular space x=[2,3], y=[2,6]:
The minimum Z Value (12) is found by interleaving the bits of the lowest x and y values: 2 and 2, respectively.
The maximum Z value (45) is found by interleaving the bits of the highest x and y values: 3 and 6, respectively.
Having found the min and max Z values (12 and 45), we now have a linear range that we can iterate across that is guaranteed to contain all of the entries inside of our rectangular space. The data within the linear range is going to be a superset of the data we actually care about: the data in the rectangular space. If we simply iterate across the entire range, we are going to find all of the data we care about and then some. You can test each value you visit to see if it's relevant or not.
An obvious optimization is to try to minimize the amount of superfluous data that you must traverse. This is largely a function of the number of 'seams' that you cross in the data -- places where the 'Z' curve has to make large jumps to continue its path (e.g. from Z value 31 to 32 below).
This can be mitigated by employing the BIGMIN and LITMAX functions to identify these seams and navigate back to the rectangle. To minimize the amount of irrelevant data we evaluate, we can:
Keep a count of the number of consecutive pieces of junk data we've visited.
Decide on a maximum allowable value (maxConsecutiveJunkData) for this count. The blog post linked at the top uses 3 for this value.
If we encounter maxConsecutiveJunkData pieces of irrelevant data in a row, we initiate BIGMIN and LITMAX. Importantly, at the point at which we've decided to use them, we're now somewhere within our linear search space (Z values 12 to 45) but outside the rectangular search space. In the Wikipedia article, they appear to have chosen a maxConsecutiveJunkData value of 4; they started at Z=12 and walked until they were 4 values outside of the rectangle (beyond 15) before deciding that it was now time to use BIGMIN. Because maxConsecutiveJunkData is left to your tastes, BIGMIN can be used on any value in the linear range (Z values 12 to 45). Somewhat confusingly, the article only shows the area from 19 on as crosshatched because that is the subrange of the search that will be optimized out when we use BIGMIN with a maxConsecutiveJunkData of 4.
When we realize that we've wandered outside of the rectangle too far, we can conclude that the rectangle in non-contiguous. BIGMIN and LITMAX are used to identify the nature of the split. BIGMIN is designed to, given any value in the linear search space (e.g. 19), find the next smallest value that will be back inside the half of the split rectangle with larger Z values (i.e. jumping us from 19 to 36). LITMAX is similar, helping us to find the largest value that will be inside the half of the split rectangle with smaller Z values. The implementations of BIGMIN and LITMAX are explained in depth in the zdivide function explanation in the linked blog post.
It appears that the quoted example in the Wikipedia article has not been edited to clarify the context and assumptions. The approach used in that example is applicable to linear data structures that only allow sequential (forward and backward) seeking; that is, it is assumed that one cannot randomly seek to a storage cell in constant time using its morton index alone.
With that constraint, one's strategy begins with a full range that is the mininum morton index (16) and the maximum morton index (45). To make optimizations, one tries to find and eliminate large swaths of subranges that are outside the query rectangle. The hatched area in the diagram refers to what would have been accessed (sequentially) if such optimization (eliminating subranges) had not been applied.
After discussing the main optimization strategy for linear sequential data structures, it goes on to talk about other data structures with better seeking capability.

Find points near LineString in mongodb sorted by distance

I have an array of points representing a street (black line) and points, representing a places on map (red points). I want to find all the points near the specified street, sorted by distance. I also need to have the ability to specify max distance (blue and green areas). Here is a simple example:
I thought of using the $near operator but it only accepts Point as an input, not LineString.
How mongodb can handle this type of queries?
As you mentioned, Mongo currently doesn't support anything other than Point. Have you come across the concept of a route boxer? 1 It was very popular a few years back on Google Maps. Given the line that you've drawn, find stops that are within dist(x). It was done by creating a series of bounding boxes around each point in the line, and searching for points that fall within the bucket.
I stumbled upon your question after I just realised that Mongo only works with points, which is reasonable I assume.
I already have a few options of how to do it (they expand on what #mnemosyn says in the comment). With the dataset that I'm working on, it's all on the client-side, so I could use the routeboxer, but I would like to implement it server-side for performance reasons. Here are my suggestions:
break the LineString down into its individual coordinate sets, and query for $near using each of those, combine results and extract an unique set. There are algorithms out there for simplifying a complex line, by reducing the number of points, but a simple one is easy to write.
do the same as above, but as a stored procedure/function. I haven't played around with Mongo's stored functions, and I don't know how well they work with drivers, but this could be faster than the first option above as you won't have to do roundtrips, and depending on the machine that your instance(s) of Mongo is(are) hosted, calculations could be faster by microseconds.
Implement the routeboxer approach server-side (has been done in PHP), and then use either of the above 2 to find stops that are $within the resulting bounding boxes. Heck since the routeboxer method returns rectangles, it would be possible to merge all these rectangles into one polygon covering your route, and just do a $within on that. (What #mnemosyn suggested).
EDIT: I thought of this but forgot about it, but it might be possible to achieve some of the above using the aggregation framework.
It's something that I'm going to be working on soon (hopefully), I'll open-source my result(s) based on which I end up going with.
EDIT: I must mention though that 1 and 2 have the flaw that if you have 2 points in a line that are say 2km apart, and you want points that are within 1.8km of your line, you'll obviously miss all the points between that part of your line. The solution is to inject points onto your line when simplifying it (I know, beats the objective of reducing points when adding new ones back in).
The flaw with 3 then is that it won't always be accurate as some points within your polygon are likely to have a distance greater than your limit, though the difference wouldn't be a significant percentage of your limit.
[1] google maps utils routeboxer
As you said Mongo's $near only works on points not lines as the centre point however if you flip your premise from find points near the line to find the line near the point then you can use your points as the centre and line as the target
this is the difference between
foreach line find points near it
and
foreach point find line near it
if you have a large number of points to check you can combine this with nevi_me's answer to reduce the list of points that need checking to a much smaller subset

what is the best way to reduce complexity of geometries

so I'm playing around with the http://www.gadm.org/ dataset;
I want to go from lat & lon to a country and state (or equivalent).
So to simplify the data I'm grouping it up and unioning the geometies; so far so good. the results are great, I can pull back Belgium and it is fine.
I pull back australia and I get victoria because the thing is too damn large.
Now I honestly don't care too much about the level of detail; if lines are w/in 1 km of where they should be I'm ok (as long as shapes are bigger, not smaller)
What is the best approach to reduce the complexity of my geospatial objects so I end up with a quick and simple view of the world?
All data is kept as Geometry data.
As you've tagged the question with "tsql" I'm assuming you're using Sql Server. Thus, you already have an handy function called Reduce which you can apply on the geometry data type to simplify it.
For example (copied from the above link):
DECLARE #g geometry;
SET #g = geometry::STGeomFromText('LINESTRING(0 0, 0 1, 1 0, 2 1, 3 0, 4 1)', 0);
SELECT #g.Reduce(.75).ToString();
The function receives a tolerance argument with the simplification threshold.
I suppose complexity is determined only by the number of vertices in a shape. There are quite a number of shape simplifying algorithms to choose from (and maybe some source too).
As a simplistic approach, you can iterate over vertices and reject concave ones if the result does not intoduce an error too large (e.g. in terms of added area), preferably adjoining smaller segments into larger. A more sophisticated approach might break an existing segment to better remove smaller ones.