Complex SELECT in Tarantool - select

There is two spaces, named e.g. Company and Cars. Space Company has company id (primary index) and geolocation (point) fields (secondary index). Space Cars has car (primary index) and companies (array of all companies where this car can be rented). I need to get top 10 Companies in specified rectangle where specific car can be rented. What is the (if I can say so) best solution to achieving this?
Here I need to combine spatial and non-spatial indexes in order to get result. My search plan is to look for car tuple and get all companies (there may be 1000 of them), and then in another space to filter 10 of the within specified rectangle.
My use case is something similar to this (not rent-a-car use case), but all logic is same. There will be much more Companies than cars (millions of Companies and 300-500k of Cars). How to optimize my plan in order to get these infos, what indexes to use, etc.? There need to be spatial and non-spatial conditions for one select, as you see.

I think the best strategy for this type of index would be to map your cars to points in another dimension, sufficiently far apart from each other. E.g. if your typical search is within a few square kilometers, make sure each car "coordinate" is at least a few dozen kilometers away from the nearest neighbour car. Then you can use our multi-dimensional RTREE index for the search.

Related

More Efficient Way of Calculating Population from Data Grid and overlapping Polygon?

folks! Apologies if this is a duplicate question and I've done some research on the topic but don't know if I'm heading in the right direction.
I have converted gridded data of population density to a MongoDB collection using a geometry object defining the population density cell as a five node polygon (the fifth node matching the first) and a float value consisting of the population in that geographic region. Even though the database is huge in size, I can quickly retrieve the "records" of the population regions as they are indexed as a 2D Sphere when it intersects a geo-polygon indicating some type of weather event or other geofence polygon.
The issue comes when I try to add all of the boxes up. It takes an exceedingly long amount of time, especially if the polygon is of a significant geographic area. The population data I have are 1km^2 cells. The adding of the data can take several seconds or, in worse case scenario, minutes!
I had a thought of creating a type of quadtree structure in the database by a lower resolution node set as a separate collection and so on and so on. Then when calculating population, I could start with the lowest res set and work my way down the node "tree" by making several database calls until there are no more matches. While I'd increase my database calls significantly, I'd reduce the sheer number of elements that I would need to add up at the end - which is taking the most computational time.
I could try to create these data using bottom-up neighbor finding whilst adding up the four population values that would make up the next lower-resolution node set. This, of course, will explode the database size and will increase the number of queries to the database for a single population request.
I haven't seen too much of this done with databases. I'd like to have it in a database (could also be PostgreSQL) since it gives me the ability to quickly geo-query by point or area. And, I'm returning the result as an API call so the efficiency of time is of the essence!
Any advice or places to research would be greatly appreciated!!!

How to do in-memory search for polygons that contain a given point?

I have a PostGreSQL table that has a geometry type column, in which different simple polygons (possibly intersecting) are stored. The polygons are are all areas within a city. I receive an input of a point (latitude-longitude pair) and need to find the list of polygons that contain the given point. What I have currently:
Unclustered GiST index defined on the polygon column.
Use ST_Contains(#param_Point, table.Polygon) on the whole table.
It is quite slow, so I am looking for a more performant in-memory alternative. I have the following ideas:
Maintain dictionary of polygons in Redis, keyed by their geohash. Polygons with same geohash would be saved as a list. When I receive the point, calculate its geohash and trim to a desired level. Then search in the Redis map and keep trimming the point's geohash until I find the first result (or enough results).
Have a trie of geohashes loaded from the database. Update the trie periodically or by receiving update events. Calculate the point's geohash, search in the trie until I find enough results. I prefer this because the map may have long lists for a geohash, given the nature of the polygons.
Any other approaches?
I have read about libraries like GeoTrie and Polygon Geohasher but can't seem to integrate them with the database and the above ideas.
Any cues or starting points, please?
Have you tried using ST_Within? Not sure if it meets your criteria but I believe it is meant to be faster than st_contains

Schema for golf tournament scores

I work for a small non-profit organization and we are looking for a way to quickly tally scores and provide a few statistics for donors at our annual golf tournament. I thought this would be fairly easy, but I'm struggling to come up with a database schema to capture the scores. I can't figure out how the player's score relates to the specific hole on the course.
This is the diagram that I have so far. Am I way off base with this?
The Schema can be found here: https://app.quickdatabasediagrams.com/#/schema/forneGJp40inm7rWlf2Sbg
Perhaps make the Scores table a m:n join table between Players and Holes to capture each player's score on each hole. This is depicted on the diagram below. To get the score for a round you'd sum all scores for all holes with a specific CourseId, for a specific event.
I also denormalised it a little, adding a total score to the Rounds table. This means that you don't need to SUM() the individual scores every time to get the tallies for each Player's Round. That's just a suggestion for performance optimisation.
Source: https://app.quickdatabasediagrams.com/#/schema/x_amshIckkeGp8KAKEAmLQ
If it is possible to play the same course twice in the same event (the first and last matches could be on the same course, for example) then you should provide for that.
I have two other suggestions:
I think the relationship between Events and Venues is in the wrong direction.
I suggest splitting Players into two tables. One representing the human being, and the other representing the human's participation in the round. Perhaps "Person" and "Contestant" would be good names.

Designed PostGIS Database...Points table and polygon tables...How to make more efficient?

This is a conceptual question, but I should have asked it long ago on this forum.
I have a PostGIS database, and I have many tables in it. I have researched some on the use of keys in databases, but I'm not sure how to incorporate keys in the case of the point data that is dynamic and increases with time.
I'm storing point data in one table, and this data grows each day. It's about 10 million rows right now and will probably grow about 10 million rows each year or so. There are lat, lon, time, and the_geom columns.
I have several other tables, each representing different polygon groups (converted shapefiles to tables with shp2pgsql), like counties, states, etc.
I'm writing queries that relate the point data to the spatial tables to see if points are inside of the polygons, resulting in things like "55 points in X polygon in the past 24 hours", etc.
The problem is, I don't have a key that relates the point table to the other tables. I think this is probably inhibiting query efficiency, but I'm not sure.
I know this question is fairly vague, and I'm happy to clarify anything, but I basically have a bunch of points in a table that I'm spatially comparing to other tables, and I'm trying to find the best way to design things.
Thanks for any help!
If you don't have already, you should build a spatial index on both the point and polygons table.
Anyway, spatial comparisons are usually slower than numerical comparison.
So adding one or more keys to the point table referencing the other tables, and using them on your select queries instead of spatial operations, will surely speed up.
Obviously, inserts will be slower, but, given the numbers you gave (10millions per year), it should not be a problem.
Probably, just adding a foreign key to the smallest entities (cities for example) and joining the others to get results (countries, states...) will be faster than spatial comparison.
Foreign keys (and other constraints) are not needed to query. Moreover they arise as a consequence of whatever design arises appropriate to the application per priciples of good design.
They just tell the DBMS that a list of values under a list of columns in a table also appear elsewhere as a list of values under a list of columns in some table. (For avoiding errors and improving optimization.)
You still would want indices on columns that will be involved in joins. Eg you might want X coordinates in two tables to hav sorted indices, in the same order. This is independent of whether one column's values form a subset of another's, ie whether a foreign key constraint holds between them.

Facebook Programming Challenge - ByteLand

ByteLand 
Byteland consists of N cities numbered 1..N. There are M roads connecting some pairs of cities. There are two army divisions, A and B, which protect the kingdom. Each city is either protected by army division A or by army division B.
 
You are the ruler of an enemy kingdom and have devised a plan to destroy Byteland. Your plan is to destroy all the roads in Byteland disrupting all communication. If you attack any road, the armies from both the cities that the road connects comes for its defense. You realize that your attack will fail if there are soldiers from both armies A and B defending any road.
 
So you decide that before carrying out this plan, you will attack some of the cities and defeat the army located in the city to make your plan possible. However, this is considerably more difficult. You have estimated that defeating the army located in city i will take up ci amount of resources. Your aim now is to decide which cities to attack so that your cost is minimum and no road should be protected from both armies A and B.
 
----Please tell me if this approach is correct----
We need to sort the cities in terms of resources required to destroy the city. For each city we need to ask the following questions:
1) Did deletion of the previous city NOT result into a state which can destroy Byteland?
2) Does it connect any road?
3) Does it connect any road which is armed by a different city?
If all of these conditions are true, we'll proceed towards destroying the city and record the total cost incurred so far and also determine if destruction of this city will lead to overall destruction of Byteland.
Since the cities are arranged in increasing order of the cost incurred, we can stop wherever we find the desired set of deletions.
You need only care about roads that link two cities with different armies - links between A and B or links between B and A, so let's delete all links from A to A or B to B.
You want to find a set of points such that each link has at least one point on it, which is a minimum weight vertex cover. On an arbitrary graph this would be NP-complete. However, your graph only ever has nodes of type A linked to nodes of type B, or the reverse - it is a bipartite graph with these two types of nodes as the two parties. So you can find a minimum weight vertex cover by using an algorithm for finding minimum weight vertex covers on bipartite graphs. Searching for this, I find e.g. http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-854j-advanced-algorithms-fall-2008/assignments/sol5.pdf
mcdowella,
But the vertices have a cost to them and the minimum vertex cover would not produce the right vertices to remove. Imagine 2 vertices (A army) pointing to the third one (B). First two vertices cost 1 each, where the third one costs 5. A minimum vertex cover would return the third one - but removing the third one costs more than removing both nodes with cost 1 + 1.
We would probably need some modified version of a minimum vertex cover here.