Looking for free or paid FSA polygons of Canada for project. Statscan data is free but number of FSA is too low - leaflet

I downloaded data showing polygons from statscan with 1643 fsa polygons (the first 3 letters of a postal code).
TWO datasets meant to be combined, hope this is clear.
DATA1 -provided to me as descriptors for specific FSA's
FSA
DATA PROVIDED
FSA1
text1
...
text ...
FSA667
text 1667
--------
--------
DATA2 - downloaded from statscan as .shp file
FSA
Polygon coords
FSA1
Cell 2
FSA1643
Cell 4
I am combining the polygons with another dataset to show layover data for each fsa. The problem is I was provided with data showing 1667 FSA's and I'm asked to produce a map that reflects their dataset (1667 items of layover information) when combined with an equal and matching number of polygons.
Effectively there are 1667-1643 = 24 missing FSA's as polygons.
Does anyone know a good source for FSA only? Other than stats canada I can't seem to find what I need. Paid and free.... I need to see what's out there and available.
Link to statscan https://www12.statcan.gc.ca/census-recensement/2021/geo/sip-pis/boundary-limites/index2021-eng.cfm?year=21
I am using leaflet.js to show the data but this is really a question about the datasets themselves. In summary I seek 1667 polygons represending canadian FSA's (forward sorting areas) and only can find 1643.
Thanks
I can successfully view and import the data I am using qgis, the issue is the data itself. I seek 1667 polygon coordinates not the 1643 I can only find online. Hopefully free... maybe paid.

Related

How do I plot one measure per axis in Tableau?

I have a problem that I have reduced to its essence with the following CSV file. Imagine we're a company that sells potatos and apples, and each customer is assigned a potato-class and an apple-class.
What I want is to plot the sales according to class - so apple sales by apple class and potato sales by potato class, in one diagram. Dragging all the measures into a workheet, I get this:
So I would like an overlay of the top left and bottom right classes.
If I combine everything into one diagram via dual axis, I get this:
So Tableau is plotting potato sales and apple sales on both the potato and the apple class axis, creating four dots per class where I want two.
Does anyone have an idea on how to basically assign one measure to one axis instead of both measures to both axes? (Hiding the "wrong" dots would also be fine).
Also, I realize that pivoting the dataset to have fields "sales", "class" and "product" would solve the problem, but reality is of course far more complicated than this toy example and it's just not feasible.
Thanks!
You'll likely have an easier time if you reshape your data first, say to have the following columns "Customer,Item,Class,Amount" -- so each row in your original data set would yield 2 rows in the transformed version of your data set. Tableau Prep can make those types of transformations easy (and repeatable), but it is possible to do something similar in Tableau Desktop alone (using a self-union and some calculated fields).
So the first 2 lines might be:
Customer,Item,Class,Amount
1,"Apple",1,2
1,"Potato",1,4
Either way make sure Class is treated as a dimension in Tableau.
Data wranglers often call this a tall format instead of a wide format.

Sort by Section of Sum Field in Tableau

I'm new to Tableau Desktop, so I'm guessing what I want to do is simple, but I don't know how to do it.
Basically, I have basketball data that gives me players total points scored over several seasons with different NBA teams. I'm trying to sort that data by team, based on the amount that each player scored for each specific team.
Right now, I have the data sorted by team, player, and the total number of points scored. The problem is - I don't actually want the total sum. (E.g. right now Shaq is listed first under the Celtics because he has the most career points out of anyone who played for the Celtics, but not for the Celtics themselves.)
Can someone tell me how I would go about sorting by sum points by team?
This is actually such a common issue that Tableau references a solution in their official training material. If my understanding of your requirement is correct, the following should solve the problem.
http://kb.tableau.com/articles/knowledgebase/finding-top-n-within-category

Openstreetmaps - continuous road?

I imported the OSM data for Switzerland in Postgres and I am interested in getting the road data of a continuous part of a highway (I know the name),that is, the part that connects two specific cities. The highway is quite big (A1) and connects a lot of cities together.
I am not sure how the sequence of road segments is stored in postgres (ie, how one knows that one road segment is directly after the other). How should I query Postgres to get a linestring with the route from on city to another? I can visualize the data of the whole highway (which spans multiple cities) in QuantumGis by doing the query:
select osm_id,way from planet_osm_roads where highway='motorway' and ref='A1';
but I don't know how to only get the osm_ids that I am interested in, in the order they appear in the route. I do not want to do a bounding box constraint in the where clauses because I am looking for a general solution and also, I am still not sure how the order of the sequence of road segments is saved.
The way I did that was to use pgrouting, namely, their pgr_dijkstra algorithm. I loaded the OSM data into a format fit for use by pgrouting using the osm2pgrouting tool.

Text clustering using MATLAB

I've got a text file which looks like this:
leave messages
enterrement de vie de garçon
sacré coeur
paris skyline
singer montmartre girl audience joined man singing playing guitar front tourists
paris skyline
paris skyline
Each row of this text file corresponds to a document, which I want to cluster using either tf-idf with cosine similarity, or agglomerative clustering. I'm using MATLAB. I've removed the stop words, and punctuation marks.
My issue is that there are 300k of these rows (documents). So scaling is one issue. Another issue is that I'm having trouble understanding how to convert each row of text into a vector of values? Can anybody explain please, with an example?
Thanks.
I tried using k-means clustering (nltk library python) and ran out of memory. Also with k-means I don't have a clue how many clusters I'm supposed to get (so I was just guessing wildly).
Another thing: I have ground truth available for this text (like, I have 0,1,2 labels in another file for this data). And I also have test data (another text file). I'm confused as to how to use this information to help cluster the test data.
Please help. Thanks.

How do you figure out what the neighboring zipcodes are?

I have a situation that's similar to what goes on in a job search engine where you type in the zipcode where you're searching for a job and the app returns jobs in that zipcode as well as in zipcodes that are 5, 10, 15, 20 or 25 miles from that zipcode, depending on preferences set by the user.
How would you calculate the neighboring locations for a zipcode?
You need to get a list of zip codes with associated longitude / latitude coordinates. Google it - there are plenty of providers.
Then take a look at this question for an algorithm of how to calculate the distance
I don't know if you can count on geonames.org to be around for the life of your app but you could use a web service like theirs to avoid reinventing the wheel.
http://www.geonames.org/export/web-services.html
I wouldn't calculate it, I would stored it as a fixed table in the database (only to change when the allocation of ZIP codes changes in a country). Make a relationship "is_neighbor_zip", which has pairs (smaller, larger). To determine whether two codes are neighboring, check in the table for specific pair. If you want all neighboring zips, it might be better to make the table symmetric.
You need to use a GIS database and ask it for ZIP codes that are nearby your current location.
You cannot simply take the ZIP code number and apply some mathematical calculations to find other nearby ZIP codes. ZIP codes are not as geographically scattered as area codes in the US, but they are not a coordinate system.
The only exception is that the ZIP+4 codes are sub-sections of the larger ZIP code. You can assume that any ZIP+4 codes that have the same ZIP code are close to each other.
I used to work on rationalizing the ZIP code handling at a company, here are some practical notes I made:
Testing ZIP codes
Hopefully has other useful info.
Whenever you create a zipcode, geocode it (e.g. google geocoder api, saving the latitude and logitude) then google the haversine formular, this will calculate the distance (as the crow flies) from a reference point, which could also be geocoded if it is a town or zipcode.
To clarify some more:
When you are retrieving records based on their location, you need to compare each longitude and latitude DECIMAL with a reference point (your users geo-coded postcode or town name)
You can query:
SELECT * FROM photos p WHERE p.long < 60 AND p.long > 50 AND p.lat > -10 AND p.lat > 10
To find all UK photos etc because the uk is between 50 and 60 degrees longitude and +-10 latitude (i might have switched long with lat, i'm fuzzy on this)
If you want to find the distance then you will need to google the haversine formula and plug in your reference values.
Hope this clears things up a little bit more, leave a comment if you need details