HERE Autocomplete API: Horribly Inaccurate Results, Ignores House Numbers - autocomplete

Getting really inaccurate results w/ HERE Autocomplete API.
Example:
Searching for "2215 E 2" in US with proximity lat/lon set just blocks away from location:
http://autocomplete.geocoder.api.here.com/6.2/suggest.json?query=2215+e+2&maxresults=50&country=USA&language=en&prox=40.593791,-73.961245&resultType=houseNumber
Returns results that don't even have '2215' as house number or not even at least a partial string match of "2215 E 2" - here are some examples of incorrectly returned results:
"United States, NY, Brooklyn, 2002 E 2nd St"
"United States, NY, Brooklyn, 2003 E 21st St"
"United States, NY, Brooklyn, 2001 E 22nd St"
"United States, NY, Brooklyn, 2001 E 13th St"
"United States, NY, Brooklyn, 2002 E 8th St"
"United States, NY, Brooklyn, 2001 E 19th St"
Looks like HERE API completely ignores house number in many cases. But for some reason when searching for the same using the HERE mobile app, I get correct results. So there must be something else employed that is not listed in the API docs.
Logically API should first return exact string match, then partial / fuzzy results.
Are there any additional search operators that need to be used in the query string?
How to get exact string match on partial the address like in HERE app?

The autocomplete api tries to get matches even for a single letter. Since you have provided prox paramater in your request you are explicitly telling the api that results closer to it are more important to you. That is why the results returned are ordered according to the distance from your prox. You can see the distance displayed in your response. Removing the prox will order the results according to most relevant match.
prox - A type of Spatial Filter. Sets a focus on a geographic area represented by a single geo-coordinate pair and optionally a radius (in meters) so the results within this area are more important than results outside of this area.
Update: Try using mapview for your usecase to get the results you need. It is also suggested in developer.here.com/documentation/geocoder-autocomplete/topics/using-autocomplete.html.
http://autocomplete.geocoder.api.here.com/6.2/suggest.json?query=2215+e+2&maxresults=50&country=USA&language=en&mapview=40.593791,-73.961245,;45.2173875,-73.961245,&resultType=houseNumber&beginHighlight=%3Cb%3E&endHighlight=%3C/b%3E

Related

I need help in data sanitization problem in tableau

I trying doing the manual sanitization, however I am getting a type mismatch error in performing the calculations.
I also need help in sanitizing the data and getting the insight as per the below instructions:
The column sellerproductcount gives you the count of products in the
form '1-16 of over 100,000 results' , and you can parse out the product count 100,000.
sellerratings - this columns gives you the % and count of positive ratings (e.g. 88% positive
in the last 12 months (118 ratings) ) if parsed correctly
sellerdetails - you can use this text to parse out phone numbers, and email IDs of
merchants, where available, so our team can reach out to them.
businessaddress - this will give you the business locations of the sellers. You can parse them
to identify if a seller is registered in the US , Germany (DE), or China (CN).
Hero Product 1 #ratings and Hero Product 2 #ratings - these 2 columns give you the number of
ratings of the 2 'hero products' or bestselling products of this seller.
I have attached the dataset for the same.
https://docs.google.com/spreadsheets/d/1PSqRCnmFgq7v7RzZaCXXoV0Edp_vM7QO/edit?usp=sharing&ouid=115547990006782902200&rtpof=true&sd=true
Most of this type of data prep can be done with string & RegEx functions like REGEX_MATCH(). Here are a few examples based on the data you shared:
Seller Product Count
INT(REGEXP_EXTRACT([Sellerproductcount], '(\d*,?\d*) results'))
1-16 of over 6,000 results >> 6000
Seller Rating (Percentage)
INT(REGEXP_EXTRACT([Sellerratings], '(\d*)% positive'))
92% positive in the last 12 months (181 ratings) >> 92
Seller Rating (Count)
INT(REGEXP_EXTRACT([Sellerratings], '(\d*) (?:total )?ratings'))
92% positive in the last 12 months (181 ratings) >> 181
Business Country Code
RIGHT([Businessaddress],2)
AM Treptower Park28-30Berlin12435DE >> DE
These examples all have very straightforward patterns that are present in all rows so they can be done pretty easily with one simple calculation. However, something like sellerdetails which is unstructured, inconsistent, and sometimes incomplete will be a bit more of a challenge. You will need to use a couple of different calculations and techniques combined together to find what you are looking for, as well as some manual data prep. Here's an example of how you can pull out email but it won't work for everything:
Email
REGEXP_EXTRACT([Sellerdetails], '([a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+#[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*)')
Good luck with your data cleaning, I suggest using sites like https://regex101.com/ and https://regexr.com/ to learn more about and help test regular expressions.

How to Rank Text search in Postgresql using multiple keywords where all keywords present should be ranked highest

I have this condition of full text search using multiple keywords - say Education, Healthcare, Nutrition. They are joined using or.
Now I want to have the search result rank in such a way that if any search result contains all the keywords would rank higher than those which have any two keywords. That is the search results containing Education, Healthcare, Nutrition would be higher ranked than those containing Education, Healthcare or Education, Nutrition or Healthcare, Nutrition.
Similarly those results containing any two keywords, would be ranked higher than those having just one keyword. In other words the search results containing only Education, Healthcare or Education, Nutrition or Healthcare, Nutrition would be ranked higher than those containing only Education or Nutrition or Healthcare.
The table column which is being used for search is of type tsquery and the search string is being converted to tsvector datatype in the search query.
I have tried using ts_rank and ts_rank_cd function altering the weights and normalization parameters but with not much benefit. If anyone could suggest me a way on how these can be used to achieve the above stated goals or any other way to attain the same goals, it would be very helpful.
Edit After jjanes' comment
Here in my case I am searching the text in articles. So an article basically contains multiple occurrences of any single keyword. So, taking my example above, if a result has each the keywords Education, Healthcare, Nutrition appear twice in an article would be ranked lower than a result where only keywords Education, Healthcare appear 4 times each.
I hope this clears the confusion as stated by #jjanes.

SQL Query sort by closest match

We have a Locations search page that is giving us a challenge I've never run across before.
In our database, we have a list of cities, states, etc. with the corresponding geocodes. All was working fun until now...
We have two locations in a city named "Black River Falls, WI" and we've recently opened one in "River Falls, WI".
So our table has records as follows:
Location City State
-------------------------------------
1 Black River Falls WI
2 Black River Falls WI
3 River Falls WI
Obviously our query uses a "LIKE" clause to match city, but when a customer searches the text "River Falls", in the search results, the first results shown are always "Black River Falls".
In our application, we always use the first match, and use it as the default. (We could change it, but it would be a lot of un-budgeted work)
I know I could simple change the sort order to have "River Falls" come up first, but that's a sloppy solution that works only in this one case.
What I'm wondering is if there is a way, through T-SQL (SQL Server 2008r2) to sort by "best match" where "River Falls" would "win" if we search for "River Falls, WI" and "Black River Falls" would work if we search for "Black River Falls" WI.
You can use the "DIFFERENCE" function to search using the closest SOUNDEX match.
Select * From Locations WHERE City=#City ORDER BY Difference(City, #City) DESC
From the MSDN Documentation:
The integer returned is the number of characters in the SOUNDEX values
that are the same. The return value ranges from 0 through 4: 0
indicates weak or no similarity, and 4 indicates strong similarity or
the same values.
DIFFERENCE and SOUNDEX are collation sensitive.
Like this:
;WITH cte As
(
SELECT *
, ROW_NUMBER() OVER(ORDER BY LEN(City)-LEN(#UserText)) As MatchPrio
FROM Cities
WHERE City LIKE '%'+#UserText+'%'
)
SELECT *
FROM cte
WHERE MatchPrio = 1
Update:
You can change the ORDER BY expression above to also use DIFFERENCE(..) or any other combination of criteria.

Group by "original order" causes looping in crystal report 11

I have a report grouped by Themes-S >> Questions-S there are 8 themes and each theme has a between 17 and 5 questions in it.
The report has 16 pages.
I need to change the ordering from specific to original when I do I end up with 288 pages
Something is looping? I can not figure out how to fix this
(using CR 11)
You just might have a very unoptimized original order, with page break properties set on start/end of group. For example, if your database stores records for 'country' in this order:
Canada
Canada
USA
Canada
USA
Canada
USA
Then with specific order "USA", "Canada", you'd have only 2 groups. With original order, however, you'd have 6 groups. Since the group is changing on (almost) every record, it might seem like it's "looping" over the values, repeating them again.
If you don't want it to do this, you can either (a) not use original order, or (b) change your source data to be better organized.

Cannot geocode international airport in Berlin, Germany

I'm trying to geocode an international airport in Berlin, Germany, but the Geocoding API returns ZERO_RESULTS.
The airport in question is Berlin Schönefeld Airport with IATA code SXF. It is the smallest of the two Berlin airports. It is impossible to geocode this airport using any address query I have checked, some of which are:
http://maps.googleapis.com/maps/api/geocode/json?address=SXF&sensor=false
http://maps.googleapis.com/maps/api/geocode/json?address=Sch%C3%B6nefeld&sensor=false
http://maps.googleapis.com/maps/api/geocode/json?address=Berlin%20Airport,%20Germany&sensor=false
The funny part is that it was possible to geocode it two or three months ago.
You can also geocode another airport that is now not in use: Tempelhof (THF), because it was replaced by Schönefeld few years ago.
How can I report issues like that?
Google Maps is able to find it without any problems.
EDITED:
For some strange reason Tegel airport is nwo not working by its IATA code either, which is TXL.
http://maps.googleapis.com/maps/api/geocode/json?address=TXL&sensor=false
When I try to search for it using "Tegel Airport" query one of the results is "Sixt Airport Berlin Tegel, 13405 Berlin, Germany" with the following types: "airport", "transit_station" and"establishment", whereas Sixt is a car rental company.
http://maps.googleapis.com/maps/api/geocode/json?address=Tegel%20Airport&sensor=false
What's going on?