Changing max node capacity in M-tree affects the results

Changing max node capacity in M-tree affects the results - scala

Posting the code for the entire tree for this problem would be pointless (too long and chaotic), and I've tried to fix this problem for a while now, so I don't really want some concrete solution, but more like ideas as to why this might be happening. So:
I have a dataset of 1.000.000 coordinates and I insert them into the tree. I do a range search after and for MaxCapacity=10 I get the correct results (and for any number >= 10). If I switch to MaxCapacity=4 results are wrong. But if I shrink the dataset to about 20.000 coordinates the results are again correct for MaxCapacity=4.
So to me, this looks like an incorrect split algorithm and it just shows for small MaxCapacities and large datasets where we have an enormous amount of splits. But the algorithm checks out for almost everything so I can't really find a mistake there. Any other ideas? Tree is written in SCALA, promotion policy promotes the two points that are the furthest away from each other and for split policy we iterate through the entries of the overflown node and we put each entry into the group of the promoted point that is closer to.

Don't know if anyone will be interested in this but I found the reasons causing this. I thought the problem was in split but I was wrong. The problem was when I was choosing in the Insert Recursion algorithm what node to jump to next in order to place the entry. So I was choosing this node by calculating the distance between each node's center and the entry's point. The node with minimum said distance was chosen.
This works fine if the entry happens to reside inside the radius of multiple nodes. In this case the minDistance works as intended but if the entry doesn't reside in any node's radius? In this case we would have to expand the radius as well to contain the entry. So we would need to find the node whose radius would expand less if it were to include the entry into its children. For a node, its distance from the entry point might be minimum but the expansion needed might be catastrophically big. I had not considered this case and as a result entries were placed in wrong nodes, causing huge expansions, causing huge overlaps. When I implemented this case the problem was fixed!

Related

More Efficient Way of Calculating Population from Data Grid and overlapping Polygon?

folks! Apologies if this is a duplicate question and I've done some research on the topic but don't know if I'm heading in the right direction.
I have converted gridded data of population density to a MongoDB collection using a geometry object defining the population density cell as a five node polygon (the fifth node matching the first) and a float value consisting of the population in that geographic region. Even though the database is huge in size, I can quickly retrieve the "records" of the population regions as they are indexed as a 2D Sphere when it intersects a geo-polygon indicating some type of weather event or other geofence polygon.
The issue comes when I try to add all of the boxes up. It takes an exceedingly long amount of time, especially if the polygon is of a significant geographic area. The population data I have are 1km^2 cells. The adding of the data can take several seconds or, in worse case scenario, minutes!
I had a thought of creating a type of quadtree structure in the database by a lower resolution node set as a separate collection and so on and so on. Then when calculating population, I could start with the lowest res set and work my way down the node "tree" by making several database calls until there are no more matches. While I'd increase my database calls significantly, I'd reduce the sheer number of elements that I would need to add up at the end - which is taking the most computational time.
I could try to create these data using bottom-up neighbor finding whilst adding up the four population values that would make up the next lower-resolution node set. This, of course, will explode the database size and will increase the number of queries to the database for a single population request.
I haven't seen too much of this done with databases. I'd like to have it in a database (could also be PostgreSQL) since it gives me the ability to quickly geo-query by point or area. And, I'm returning the result as an API call so the efficiency of time is of the essence!
Any advice or places to research would be greatly appreciated!!!

compare all data in database at same time ( real time)

I have a problem with my android app, I have x value (whatever it is) and I have data in the database, I want to compare the value of x with all the data in the database at the same time in real time
the app is using sqlite.
I used a loop but when the database is large in this case my app lags in comparing all the data.my code is
public void Check_Distance(Location Current_Location,ArrayList<Location> LocationArrayList1)
{
double Distance;
for(int i=0;i<LocationArrayList1.size();i++)
{
Distance=distanceBetween(Current_Location,LocationArrayList1.get(i));
if(Distance<=0.1*1000){ // if distance is less then 100m give a sound
Notification_Sound();
}
}
}

You can't look at every record in the database at the exact same time. That's called quantum computing, and is currently an active research area where people far smarter than you or I are currently spending millions of dollars to try and create a machine that can do this kind of parallel processing.
That being said, you can make your algorithm more efficient, but that takes some effort to do. Both of the below are based on the idea of eliminating the majority of the locations that are obviously too far away very quickly, and performing more in-depth checks on those that could be in range.
One method is to sort the locations in ascending order in two arrays - one by North/South and the other by East/West. Find all entries within a given distance of the current position in each list, then combine the results to get a list of points within a box of X distance from the location. This box will have a much smaller number of points within it that you can then apply an iterative, circular, distance based approach to.
Another is to create a quadtree. This would subdivide the map area into a set of bounding volumes, where each volume would have a set of points, or additional bounding volumes. You can then place down your search area and find all the quadtree volumes that intersect with your circular search area, greatly minimizing the number of locations you need to do a true distance check on.

pgr_drivingDistance with flexible distance value on each route

I would like to calculate a graph similiar to an isochrone using pgsql. Therefore, I already used the algorithm pgr_drivingDistance. You provide a starting point and a distance value and receives an isochrone.
The output using the algorithm is received with code which looks something like:
SELECT * FROM pgr_drivingDistance(
'SELECT id, source, target, cost FROM edge_table',
2, 2, false -- starting point, distance, directed
);
The red star represents the starting point.
Now, I want a graph which works the same way, like starting at one point and get routes in all directions. The difference is, that I don't want to provide a travel distance, but a list with point coordinates, which are lying on the road network. The route in every direction has to stop at the first reached point lying on each route. The distance on every route is different and I don't know which points are the closest ones.
The desired output using the "stopping" points, which are visualized in green, is supposed to look like this.
I tried already:
Using the given algorithm pgr_drivingDistance and raising the distance value every time no point is reached -> problem here: the distance is equal for all directions and not individual for each route.
Using the algorithm pgr_dijkstra for each route -> problem here: because you don't know which point is affected you don't know which end point to choose for the calculation. You also cannot take the closest one in the immediate vicinity because you need the closest one on the specific route.
I know that I have to build an almost complete new algorithm, but maybe someone has an idea how to start or even experience with this kind of problem.
Thank you in advance!

This is a one to many routing problem. You have to compute the route to each end point to find the shortest one. I have not looked at the pgRouting function recently, but I believe there is a one to many, many to one and many to many Dijkstra function(s). You should be able to use the one to many to compute all the routs in one go and then you can sort the routs based on length to find the shortest one.

How can I write a logical process for finding the area of a point on a graph?

I have the following graph with 2 different parameters called p and t. 
Their relationship is experimentally found. Manually by knowing (t,p), you can simply find the area number (group) of the point based on where it is located. For example, point M(t,p), locates in area 3 and belongs to group number 3. However, I would like to write a code/logical approach which automatically finds the group numbers. therefore when it reads (t,p) it will find the location of the point and give the group/Area number it belongs.
Is there any solution in Matlab for this scope?  Graph

If you have the Image Processing Toolbox and your contours are closed, you can use imfill to fill them up (a bit like the bucket tool in Paint) and assign different values to each filled up region. Does this make sense to you? Let me know if you would like more detail.
Marta

Find points near LineString in mongodb sorted by distance

I have an array of points representing a street (black line) and points, representing a places on map (red points). I want to find all the points near the specified street, sorted by distance. I also need to have the ability to specify max distance (blue and green areas). Here is a simple example:
I thought of using the $near operator but it only accepts Point as an input, not LineString.
How mongodb can handle this type of queries?

As you mentioned, Mongo currently doesn't support anything other than Point. Have you come across the concept of a route boxer? 1 It was very popular a few years back on Google Maps. Given the line that you've drawn, find stops that are within dist(x). It was done by creating a series of bounding boxes around each point in the line, and searching for points that fall within the bucket.
I stumbled upon your question after I just realised that Mongo only works with points, which is reasonable I assume.
I already have a few options of how to do it (they expand on what #mnemosyn says in the comment). With the dataset that I'm working on, it's all on the client-side, so I could use the routeboxer, but I would like to implement it server-side for performance reasons. Here are my suggestions:
break the LineString down into its individual coordinate sets, and query for $near using each of those, combine results and extract an unique set. There are algorithms out there for simplifying a complex line, by reducing the number of points, but a simple one is easy to write.
do the same as above, but as a stored procedure/function. I haven't played around with Mongo's stored functions, and I don't know how well they work with drivers, but this could be faster than the first option above as you won't have to do roundtrips, and depending on the machine that your instance(s) of Mongo is(are) hosted, calculations could be faster by microseconds.
Implement the routeboxer approach server-side (has been done in PHP), and then use either of the above 2 to find stops that are $within the resulting bounding boxes. Heck since the routeboxer method returns rectangles, it would be possible to merge all these rectangles into one polygon covering your route, and just do a $within on that. (What #mnemosyn suggested).
EDIT: I thought of this but forgot about it, but it might be possible to achieve some of the above using the aggregation framework.
It's something that I'm going to be working on soon (hopefully), I'll open-source my result(s) based on which I end up going with.
EDIT: I must mention though that 1 and 2 have the flaw that if you have 2 points in a line that are say 2km apart, and you want points that are within 1.8km of your line, you'll obviously miss all the points between that part of your line. The solution is to inject points onto your line when simplifying it (I know, beats the objective of reducing points when adding new ones back in).
The flaw with 3 then is that it won't always be accurate as some points within your polygon are likely to have a distance greater than your limit, though the difference wouldn't be a significant percentage of your limit.
[1] google maps utils routeboxer

As you said Mongo's $near only works on points not lines as the centre point however if you flip your premise from find points near the line to find the line near the point then you can use your points as the centre and line as the target
this is the difference between
foreach line find points near it
and
foreach point find line near it
if you have a large number of points to check you can combine this with nevi_me's answer to reduce the list of points that need checking to a much smaller subset

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse