Lat & Long Data for Clustering on the basis of count of shops in each Lat & Long - cluster-analysis

I have Districts data with the count of shops, their respective headquarters latitude & longitude, and district Area in km2 as columns. I want to cluster these districts in such a way that I can assign a person to contact these shops. If count is more, I would assign one person for that district alone and If the count is less I would assign one person for 2 or 3 districts nearby.
You can find the data from below site
http://notepad.pw/distdata
Please explain to me which type of clustering to use and their dissimilarity types to use. Basically how to approach the problem. Thanks in advance

Related

Tableau: how to make a count if in a for loop?

I'm just starting off Tableau and would like to do a count if in a for loop.
I have the following variables:
City
User
Round: takes values of either A or B
Amount
I would like to have a countif function that shows how many users received any positive amount in both round A and round B in a given city.
In my dashboard, each row represents a city, and I would like to have a column that shows the total number of users in each city that received amounts in both rounds.
Thanks!
You can go for a simple solution that works.
Create a calculated field called "Positive Rounds per User" using the below formula:
// counts the number of unique rounds that had positive amounts per user in a city
{ FIXED [User], [City]: COUNTD(IIF([Amount]>0, [Round], NULL))}
You can use the above to create another calculated field called "unique users":
// unique number of users that have 2 in "Positive Rounds per User" field
COUNTD(IIF([Positive Rounds per User]=2, [User], NULL))
You can combine the calculation of 1 and 2 into one but it gets complex to read so better to split them up

Complex SELECT in Tarantool

There is two spaces, named e.g. Company and Cars. Space Company has company id (primary index) and geolocation (point) fields (secondary index). Space Cars has car (primary index) and companies (array of all companies where this car can be rented). I need to get top 10 Companies in specified rectangle where specific car can be rented. What is the (if I can say so) best solution to achieving this?
Here I need to combine spatial and non-spatial indexes in order to get result. My search plan is to look for car tuple and get all companies (there may be 1000 of them), and then in another space to filter 10 of the within specified rectangle.
My use case is something similar to this (not rent-a-car use case), but all logic is same. There will be much more Companies than cars (millions of Companies and 300-500k of Cars). How to optimize my plan in order to get these infos, what indexes to use, etc.? There need to be spatial and non-spatial conditions for one select, as you see.
I think the best strategy for this type of index would be to map your cars to points in another dimension, sufficiently far apart from each other. E.g. if your typical search is within a few square kilometers, make sure each car "coordinate" is at least a few dozen kilometers away from the nearest neighbour car. Then you can use our multi-dimensional RTREE index for the search.

Facebook Programming Challenge - ByteLand

ByteLand 
Byteland consists of N cities numbered 1..N. There are M roads connecting some pairs of cities. There are two army divisions, A and B, which protect the kingdom. Each city is either protected by army division A or by army division B.
 
You are the ruler of an enemy kingdom and have devised a plan to destroy Byteland. Your plan is to destroy all the roads in Byteland disrupting all communication. If you attack any road, the armies from both the cities that the road connects comes for its defense. You realize that your attack will fail if there are soldiers from both armies A and B defending any road.
 
So you decide that before carrying out this plan, you will attack some of the cities and defeat the army located in the city to make your plan possible. However, this is considerably more difficult. You have estimated that defeating the army located in city i will take up ci amount of resources. Your aim now is to decide which cities to attack so that your cost is minimum and no road should be protected from both armies A and B.
 
----Please tell me if this approach is correct----
We need to sort the cities in terms of resources required to destroy the city. For each city we need to ask the following questions:
1) Did deletion of the previous city NOT result into a state which can destroy Byteland?
2) Does it connect any road?
3) Does it connect any road which is armed by a different city?
If all of these conditions are true, we'll proceed towards destroying the city and record the total cost incurred so far and also determine if destruction of this city will lead to overall destruction of Byteland.
Since the cities are arranged in increasing order of the cost incurred, we can stop wherever we find the desired set of deletions.
You need only care about roads that link two cities with different armies - links between A and B or links between B and A, so let's delete all links from A to A or B to B.
You want to find a set of points such that each link has at least one point on it, which is a minimum weight vertex cover. On an arbitrary graph this would be NP-complete. However, your graph only ever has nodes of type A linked to nodes of type B, or the reverse - it is a bipartite graph with these two types of nodes as the two parties. So you can find a minimum weight vertex cover by using an algorithm for finding minimum weight vertex covers on bipartite graphs. Searching for this, I find e.g. http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-854j-advanced-algorithms-fall-2008/assignments/sol5.pdf
mcdowella,
But the vertices have a cost to them and the minimum vertex cover would not produce the right vertices to remove. Imagine 2 vertices (A army) pointing to the third one (B). First two vertices cost 1 each, where the third one costs 5. A minimum vertex cover would return the third one - but removing the third one costs more than removing both nodes with cost 1 + 1.
We would probably need some modified version of a minimum vertex cover here.

calculating percentage of average in tsql

I have a set of records associated with IDs. there may be any number of records in this association, with currency values. one of these values is flagged as selected. I need to calculate an average of all the associated currency values, then take a percentage between this average and the lowest value, grouped by ID. all the data needed is in one table:
input:
table x: ID, Selected, DollarAmt
output:
view y: ID, Average, Percentage
I'm having problems creating this query(view) and it's driving me nuts. can anyone at least point me in the right direction?
Thanks all.
You can use this query:
select Id,
AVG(DollarAmt) Average,
AVG(DollarAmt)/MIN(DollarAmt) Percentage
from TableX
group by Id
But i´still don´t understand the need of the "Selected" variable in TableX
Regards

How do you figure out what the neighboring zipcodes are?

I have a situation that's similar to what goes on in a job search engine where you type in the zipcode where you're searching for a job and the app returns jobs in that zipcode as well as in zipcodes that are 5, 10, 15, 20 or 25 miles from that zipcode, depending on preferences set by the user.
How would you calculate the neighboring locations for a zipcode?
You need to get a list of zip codes with associated longitude / latitude coordinates. Google it - there are plenty of providers.
Then take a look at this question for an algorithm of how to calculate the distance
I don't know if you can count on geonames.org to be around for the life of your app but you could use a web service like theirs to avoid reinventing the wheel.
http://www.geonames.org/export/web-services.html
I wouldn't calculate it, I would stored it as a fixed table in the database (only to change when the allocation of ZIP codes changes in a country). Make a relationship "is_neighbor_zip", which has pairs (smaller, larger). To determine whether two codes are neighboring, check in the table for specific pair. If you want all neighboring zips, it might be better to make the table symmetric.
You need to use a GIS database and ask it for ZIP codes that are nearby your current location.
You cannot simply take the ZIP code number and apply some mathematical calculations to find other nearby ZIP codes. ZIP codes are not as geographically scattered as area codes in the US, but they are not a coordinate system.
The only exception is that the ZIP+4 codes are sub-sections of the larger ZIP code. You can assume that any ZIP+4 codes that have the same ZIP code are close to each other.
I used to work on rationalizing the ZIP code handling at a company, here are some practical notes I made:
Testing ZIP codes
Hopefully has other useful info.
Whenever you create a zipcode, geocode it (e.g. google geocoder api, saving the latitude and logitude) then google the haversine formular, this will calculate the distance (as the crow flies) from a reference point, which could also be geocoded if it is a town or zipcode.
To clarify some more:
When you are retrieving records based on their location, you need to compare each longitude and latitude DECIMAL with a reference point (your users geo-coded postcode or town name)
You can query:
SELECT * FROM photos p WHERE p.long < 60 AND p.long > 50 AND p.lat > -10 AND p.lat > 10
To find all UK photos etc because the uk is between 50 and 60 degrees longitude and +-10 latitude (i might have switched long with lat, i'm fuzzy on this)
If you want to find the distance then you will need to google the haversine formula and plug in your reference values.
Hope this clears things up a little bit more, leave a comment if you need details