SQL Query sort by closest match

SQL Query sort by closest match - tsql

We have a Locations search page that is giving us a challenge I've never run across before.
In our database, we have a list of cities, states, etc. with the corresponding geocodes. All was working fun until now...
We have two locations in a city named "Black River Falls, WI" and we've recently opened one in "River Falls, WI".
So our table has records as follows:
Location City State
-------------------------------------
1 Black River Falls WI
2 Black River Falls WI
3 River Falls WI
Obviously our query uses a "LIKE" clause to match city, but when a customer searches the text "River Falls", in the search results, the first results shown are always "Black River Falls".
In our application, we always use the first match, and use it as the default. (We could change it, but it would be a lot of un-budgeted work)
I know I could simple change the sort order to have "River Falls" come up first, but that's a sloppy solution that works only in this one case.
What I'm wondering is if there is a way, through T-SQL (SQL Server 2008r2) to sort by "best match" where "River Falls" would "win" if we search for "River Falls, WI" and "Black River Falls" would work if we search for "Black River Falls" WI.

You can use the "DIFFERENCE" function to search using the closest SOUNDEX match.
Select * From Locations WHERE City=#City ORDER BY Difference(City, #City) DESC
From the MSDN Documentation:
The integer returned is the number of characters in the SOUNDEX values
that are the same. The return value ranges from 0 through 4: 0
indicates weak or no similarity, and 4 indicates strong similarity or
the same values.
DIFFERENCE and SOUNDEX are collation sensitive.

Like this:
;WITH cte As
(
SELECT *
, ROW_NUMBER() OVER(ORDER BY LEN(City)-LEN(#UserText)) As MatchPrio
FROM Cities
WHERE City LIKE '%'+#UserText+'%'
)
SELECT *
FROM cte
WHERE MatchPrio = 1
Update:
You can change the ORDER BY expression above to also use DIFFERENCE(..) or any other combination of criteria.

Related

How to Rank Text search in Postgresql using multiple keywords where all keywords present should be ranked highest

I have this condition of full text search using multiple keywords - say Education, Healthcare, Nutrition. They are joined using or.
Now I want to have the search result rank in such a way that if any search result contains all the keywords would rank higher than those which have any two keywords. That is the search results containing Education, Healthcare, Nutrition would be higher ranked than those containing Education, Healthcare or Education, Nutrition or Healthcare, Nutrition.
Similarly those results containing any two keywords, would be ranked higher than those having just one keyword. In other words the search results containing only Education, Healthcare or Education, Nutrition or Healthcare, Nutrition would be ranked higher than those containing only Education or Nutrition or Healthcare.
The table column which is being used for search is of type tsquery and the search string is being converted to tsvector datatype in the search query.
I have tried using ts_rank and ts_rank_cd function altering the weights and normalization parameters but with not much benefit. If anyone could suggest me a way on how these can be used to achieve the above stated goals or any other way to attain the same goals, it would be very helpful.
Edit After jjanes' comment
Here in my case I am searching the text in articles. So an article basically contains multiple occurrences of any single keyword. So, taking my example above, if a result has each the keywords Education, Healthcare, Nutrition appear twice in an article would be ranked lower than a result where only keywords Education, Healthcare appear 4 times each.
I hope this clears the confusion as stated by #jjanes.

Post-Aggregation Join of two tables in Tableau

I´m new to tableau and need to do some kind of post-aggregation join, i think. My goal is, to match some data from google search console to some other regional data concerning hotels. This way, i hope to see if hotels for a certain region perform better or worse than their popularity in the google searches would suggest.
I have one table with the hotel-data which looks like this:
Table 1
Here we have three hierarchical region levels. Country, state and region (and some KPI that is aggregated according to the drill-down-level).
Table 2
Table 2 does not follow the hierarchical dimensionality as table 1, but has the same regions.
What i want tableau to do:
I want tableau, to join the regions on the lowest region level, but NOT to aggregate the KPI impressions. So, when i drill-up to the country level, i want the "random KPI" to be summed to 389, but the impressions should be 40.000 only. You might ask yourself why - it´s a different thing if somebody only searches for "country 1" or if he searches for a state or region of this country. For this analysis it is the goal, to not aggregate the impressions for each region.
I would be glad for any hints on how to do this. I thought about doing a blend - which i thought is a kind of post-aggregation join, but i found out, that if i join on the lowest region-level of table 1 with the region-variable of table 2, the impressions always get aggregated.
Thanks everyone!

Tableau Mixed Data

I've been tasked to set up a Tableau worksheet of counts of data (ultimately to create percentages) where the contrived incoming data looks like the following.
id fruit
1 apple
1 orange
1 lemon
2 apple
2 orange
3 apple
3 orange
4 lemon
4 orange
The worksheet needs to look something like the following:
Count of ids
2 Lemons
2 No lemons
I've only been using Tableau for about 4 hours, so is this doable? Can anyone point me in the right direction?
The data is coming in from a SQL Server database in a format that I can control if that helps contribute towards a solution.

Alex's solution based on sets are very good for this scenario, but I would like to show that LODs can be more flexible if you need to extend your solution to include more categories.
for the current scenario, create a calculation with below formula and create text table using COUNTD(Id)
{FIXED [Id]:IF MAX([Fruit]='lemon') THEN 'Lemon' ELSE 'No Lemon' END}
Now for the extension part, you are considering below list where you want to count IDs with Lemon, Apple and others. Since no double counting of Ids are allowed, categorization will follow the order. (This kind of precedence will be a headache without LODs)
Now you can change your calculation as below:
{FIXED [Id]:IF MAX([Fruit]='lemon') THEN 'Lemon'
ELSEIF MAX([Fruit]='apple') THEN 'Apple'
ELSE 'No Lemon or Apple' END}
Now your visualization automatically changes to include the new category. This can be extended for any number of fruits.

This is a good use for a set.
In the data pane on the left sidebar, right click on the Id field and create a set named "Ids that contain at least one lemon" (or use a shorter less precise name)
In the set definition dialog panel, define the set by choosing "Use all" from the General tab, and then on the Condition tab, define the condition by the formula max([Fruit]="lemon")
There are many ways to think of a set, but the most abstract is just as a mathematical set of Ids that satisfy the condition. Remember each Id has many data rows, so the condition is a function of many data rows and uses the aggregation function MAX(). For booleans, True is treated as greater than False, so MAX() will return True if at least one of the data rows satifies the condition. By contrast, MIN() is True only if ALL (non-null) data rows satisfy the condition.
Once you have a set that separates your ids into Lemon scented Ids and others, then you can use that set in many ways - in calculated fields, in filters, in combination with other sets to make new sets, and of course on shelves to make visualizations.
To get a result like your question seeks, you could put your new set on the Row shelf, and put CNTD(ID) on the text shelf or columns shelf. Make sure you understand why you need count distinct (CNTD) instead of SUM([Number of Records]) here.
BTW, the LOD calculation { fixed [Id] : max([Fruit]="lemon") } is effectively the same solution.

Using COUNT in Tableau to count observations by group

Thanks in advance for any advice you can offer! I'm building a Tableau dashboard to explore housing affordability and school quality in different neighborhoods in my area. A user will select their occupation and see a graph of neighborhoods plotted based on school quality and housing affordability. To explore housing affordability, I'm using county level assessor data with the valuation of every property matched to neighborhoods.
The goal is to display the percentage of homes in an area that are affordable given the median occupational wages for the job a user selected. Right now, I'm trying to use a calculated field with COUNT([Parcels]<[Occupation])/COUNT([Parcels]), but I need to find a way to count the number of properties in each specific neighborhood below the cut off value.
Does anyone know of a way to count elements of a particular group in this way in Tableau?
I'm on a Mac, using Tableau Desktop, and doing the back end analysis work in R. Thank you!

You seem to misunderstand what the function COUNT() does. You are certainly not alone. Count() behaves in Tableau almost identically to how it does with SQL.
Count([some field]) returns the number of data rows where the value for [some field] is not null. It does not not return the number of rows where [some field] evaluates to true, or a positive number, or anything else.
If [some field] always has a non-null value, then Count([some field]) is the same as SUM([Number of Records]). If [some field] is always null, then Count([some field]) is zero. Count() is not like Excel's CountIf function.
If you want to count data rows that meet a condition, you could try COUNT(if [condition] then 1 end) Since the missing ELSE case defaults to null values, that expression will count rows where [condition] is true.
So one way to get the percentage of affordable homes is count(if [affordable] then 1 end) / count(1) assumes each Data row represents a home. Then format your field to display as a percentage. Another option is to learn to use quick table calcs

If you want to display the number of rows in a given visualized table you could also use SIZE()
Source, official docs:
https://help.tableau.com/current/pro/desktop/en-us/functions_functions_tablecalculation.htm#size

Greatest n per group with multiple criteria for greatest

I need to select the largest, most recent or currently active term across a number of schools, with the assumption that is possible for a school to have multiple concurrent terms (ie, one term that honors students are registered in, and another for non honors). Also need to take into account the end date, as the honors term may have the same start date but may be year long instead of just a semester, and I want the semester.
Code looks something like this:
SELECT t.school_id, t.term_id, COUNT(s.id) AS size, t.start_date, t.end_date
FROM term t
INNER JOIN students s ON t.term_id = s.term_id
WHERE t.school_id = (some school id)
GROUP BY t.school_id, t.term_id
ORDER BY t.start_date DESC, t.end_date ASC, size DESC LIMIT 1;
This works perfectly to find the largest currently or most recently active term, but I want to be able to eliminate the WHERE t.school_id = (some school id) part.
A standard greatest n per group can easily choose the largest OR most recent term, but I need to select the most recent term that ends soonest with the largest number of students.

Not sure I am interpreting your question correctly. Would be easier if you had supplied table definitions including primary and foreign keys.
If you want the the most recent term that ends soonest with the largest number of students per school, this might do it:
SELECT DISTINCT ON (t.school_id)
t.school_id, t.term_id, s.size, t.start_date, t.end_date
FROM term t
JOIN (
SELECT term_id, COUNT(s.id) AS size
FROM students
GROUP BY term_id
) s USING (term_id)
ORDER BY t.school_id, t.start_date DESC, t.end_date, size DESC;
More explanation for DISTINCT ON in this related answer:
Select first row in each GROUP BY group?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse