How can I dynamically choose which column to join on in SQL?

How can I dynamically choose which column to join on in SQL? - tsql

Say I have the following table in SQL Server (2008):
Person
|PersonID|NickName|FirstName|LastName|
|1 |Jim |James |Leahy |
|2 |Mike |Michael |Ross |
|3 |Bob |Robert |Helberg |
I want to know if the following is possible in SQL. I have a main table and I would like to find matches on another table based on the NickName and FirstName columns. However, I want the columns to be joined to in a specific order.
I want to join on the first column from above (NickName or FirstName) which will match the identifier in the table below
|Identifier|PersonId|
|Jim |1 | <- should return PersonId = 1
|Michael |2 | <- should return PersonId = 2
So if there is a match on NickName then choose the row. If there is no match on NickName then look at FirstName.
Is there any way I can query on NickName and FirstName columns in a particular order?
I don't think COALESCE will work since we are not guaranteed that any of the columns will be NULL - we only know that a match may not occur on the column instead.
Please let me know if you need clarification; I may not have worded this well.

Since you didn't specify the schema of main table I assume it has Identifier varchar(100) field, which can contain either nickname or firstname. In this case the query should look like:
select m.identifier,
PersonId = isnull(p1.PersonID,p2.PersonID)
from maintable m
left join persons p1 on p1.nickname = m.identifier
left join persons p2 on p2.firstname = m.identifier

Related

How to use COUNT() in more that one column?

Let's say I have this 3 tables
Countries ProvOrStates MajorCities
-----+------------- -----+----------- -----+-------------
Id | CountryName Id | CId | Name Id | POSId | Name
-----+------------- -----+----------- -----+-------------
1 | USA 1 | 1 | NY 1 | 1 | NYC
How do you get something like
---------------------------------------------
CountryName | ProvinceOrState | MajorCities
| (Count) | (Count)
---------------------------------------------
USA | 50 | 200
---------------------------------------------
Canada | 10 | 57
So far, the way I see it:
Run the first SELECT COUNT (GROUP BY Countries.Id) on Countries JOIN ProvOrStates,
store the result in a table variable,
Run the second SELECT COUNT (GROUP BY Countries.Id) on ProvOrStates JOIN MajorCities,
Update the table variable based on the Countries.Id
Join the table variable with Countries table ON Countries.Id = Id of the table variable.
Is there a possibility to run just one query instead of multiple intermediary queries? I don't know if it's even feasible as I've tried with no luck.
Thanks for helping

Use sub query or derived tables and views
Basically If You You Have 3 Tables
select * from [TableOne] as T1
join
(
select T2.Column, T3.Column
from [TableTwo] as T2
join [TableThree] as T3
on T2.CondtionColumn = T3.CondtionColumn
) AS DerivedTable
on T1.DepName = DerivedTable.DepName
And when you are 100% percent sure it's working you can create a view that contains your three tables join and call it when ever you want
PS: in case of any identical column names or when you get this message
"The column 'ColumnName' was specified multiple times for 'Table'. "
You can use alias to solve this problem

This answer comes from #lotzInSpace.
SELECT ct.[CountryName], COUNT(DISTINCT p.[Id]), COUNT(DISTINCT c.[Id])
FROM dbo.[Countries] ct
LEFT JOIN dbo.[Provinces] p
ON ct.[Id] = p.[CountryId]
LEFT JOIN dbo.[Cities] c
ON p.[Id] = c.[ProvinceId]
GROUP BY ct.[CountryName]
It's working. I'm using LEFT JOIN instead of INNER JOIN because, if a country doesn't have provinces, or a province doesn't have cities, then that country or province doesn't display.
Thanks again #lotzInSpace.

Select only the rows with the latest date in postgres

I only want the latest date for each row (house) the number of entries per house varies sometimes there might be one sale sometimes multiple.
Date of sale | house number | street | price |uniqueref
-------------|--------------|--------|-------|----------
15-04-1990 |1 |castle |100000-| 1xzytt
15-04-1995 |1 |castle |200000-| 2jhgkj
15-04-2005 |1 |castle |800000-| 3sdfsdf
15-04-1995 |2 |castle |200000-| 2jhgkj
15-04-2005 |2 |castle |800000-| 3sdfsdf
What I have working is as follows
Creating VIEW as (v_orderedhouses) ORDER BY house number, street with date ordered on DESCso that latest date is first returned.
I then feed that into another VIEW (v_latesthouses) using DISTINCT ON (house number, street). Which gives me;
Date of sale | house number | street | price |uniqueref
-------------|--------------|--------|-------|----------
15-04-2005 |1 |castle |800000-| 3sdfsdf
15-04-2005 |2 |castle |800000-| 3sdfsdf
This works but seems like there should be a more elegant solution. Can I get to the filtered view in one step?

You do not need to create a bunch of views, just:
select distinct on(street, house_number)
*
from your_table
order by
street, house_number, -- those fields should be in the "order by" clause because it is in the "distinct on" expression
date_of_sale desc;
To make this query faster you could to create an index according to the order by:
create index index_name on your_table(street, house_number, date_of_sale desc);
Do not forget to analyse your tables regularly (depending on the grown speed):
analyse your_table;

You can use window function row_number for this
select * from (
select your_table.*, row_number() over(partition by house_number order by Date_of_sale desc) as rn from your_table
) tt
where rn = 1

This is what I use and it works fast(is a generic solution, as far as I tested every database software can do this):
SELECT t1.date_of_sale, t1.house_number
FROM table t1
LEFT JOIN table t2 ON (t2.house_number = t1.house_number AND t2.date_of_sale>t1.date_of_sale)
WHERE t2.pk IS NULL
GROUP BY t1.date_of_sale, t1.house_number

table with two nullable foreign keys join results into single column

Sorry if title isn't very descriptive. I have a table like this example, and am using sql server 2012:
PersonId | PetID
and want to join it to the following two tables
PersonId | PersonName | PersonAsset
AnimalId | AnimalName | Animal Asset
So the end result is:
PersonId | PetId | Name | Asset
-------------------------------
1 null Dave 1
null 1 Fido 2

The output you require can be achieved by using a LEFT JOIN for your two tables and ISNULL for the required fields.
For example (assuming the first table is named 'common'):
SELECT common.PersonId,
common.PetId,
ISNULL(person.PersonName, animal.AnimalName) AS Name,
ISNULL(person.PersonAsset, animal.AnimalAsset) AS Asset
FROM common
LEFT JOIN person ON common.PersonId = person.PersonId
LEFT JOIN animal ON common.AnimalId = animal.AnimalId

Modifying Duplicates

I'm trying to figure out the means to do two things:
Locate duplicate records in a table.
These are typically duplicate names in the 'Name' column but
specifically those where the ParentID is the same. It's fine if I
have identical names where the ParentID is different because these
names (or Children) belong to different parents.
Modify these duplicates.
Preferably, I would modify these duplicates by appending the 'ID' to the name.
I came up with a query to locate duplicates and them dump them into a temp table:
CREATE TABLE #Dup(
Name varchar(50),
CustNo varchar(7))
insert into #Dup (Name, CustNo)
SELECT [Name],[CustNo]
FROM [02Kids]
GROUP BY [Name], [CustNo]
HAVING Count(*)>1
This seems to work. When I view the data in the table I see the name and I see the ParentID identifying that indeed, this is a name that appears twice for that parent ID. Its worth noting that the name only appears once in the table. It doesn't show two rows with the same name and ID (perhaps this is part of my problem).
Here's the query I came up with attempting to perform the modification:
select[#Dup].[Name] + ' ' + [02Kids].[ID] as iName, [02Kids].ParentID
from #Dup
inner join [02Kids]
on #Dup.CustNo = [02Kids].ParentID
order by iName asc
Well, this sort of works, except I end up with massive amounts of duplicates. For example, one "Name" that I can confirm only has two duplicates ends up with close to 13 in total from that select query.
I may be way off here with that query (this is practice stuff I'm using to teach myself) but I'm having trouble conceiving a correct means to do this. I am still learning syntax, keywords, functions, etc so maybe there's something I should use I just don't know of yet.

Well to only get the matches you want in your "modification" query you'll need to add a match on name to your join clause. Right now you are matching your duplicate record to every kid for that parent, not just the duplicates. So if one parent has 13 kids, only one of which is a duplicate, you'll get 13 extra records.
inner join [02Kids]
on #Dup.CustNo = [02Kids].ParentID AND
#Dup.Name = [02Kids].Name

Does this answer your question?
USE tempdb
GO
CREATE TABLE Person (PersonID INT, FName VARCHAR(25), LName VARCHAR(25))
INSERT INTO Person VALUES
(1, 'Jim', 'Jones'),
(2, 'Rob', 'Smith'),
(3, 'Matt', 'Bridges'),
(4, 'Jim', 'Jones'),
(5, 'Jim', 'Jones'),
(6, 'Alex', 'Door'),
(7, 'Wilhelm', 'Kay')
GO
;WITH DupDetect AS
(
SELECT *
,Occ = ROW_NUMBER() OVER (PARTITION BY FName, LName ORDER BY PersonID)
FROM Person
)
UPDATE DupDetect
SET FName = LTRIM(STR(PersonID)) + FName
WHERE Occ > 1
SELECT *
FROM Person
Resulting in;
PersonID | FName | LName
---------------------------------
1 | Jim | Jones
2 | Rob | Smith
3 | Matt | Bridges
4 | 4Jim | Jones
5 | 5Jim | Jones
6 | Alex | Door
7 | Wilhelm | Kay
I'm unaware of any cleaner or more efficient pattern for the modification or removal of duplicates.

How to group by in DB2 IBM and get the first item in each group?

I have a table like this:
|sub_account|name|email|
|-----------|----|-----|
// same account and same name: email different
|a1 |n1 |e1 |
|a1 |n1 |e2 |
// same account, name and email
|a2 |n2 |e3 |
|a2 |n2 |e3 |
I would like a query to get a table like this:
|sub_account|name|email|
|-----------|----|-----|
// nothing to do here
|a1 |n1 |e1 |
|a1 |n1 |e2 |
// remove the one that is exactly the same, but leave at least one
|a2 |n2 |e3 |
I've tried:
select sub_account, name, first(email)
from table
group by sub_account, name
but as you know "first" doesn't exists in the DB2; what is the alternative to it?
thanks

select sub_account, name, email
from table
group by sub_account, name, email

I am not sure in DB2. In SQL server, you can use DISTINCT for your issue.. You may try.
SELECT DISTINCT sub_acount, name, email
from TABLE

Create a subquery with the table values + a counter (pos) that gets increased for each row and gets reset to 1 each time a new sub-account+name is reached.
The final query filters out all results from the subquery other than those with pos 1 (i.e. first entries of the group):
select *
from (
select sub_account, name, email,
ROW_NUMBER() OVER (PARTITION BY sub_account, name
ORDER BY email DESC) AS pos
from table
)
where pos = 1

I found a way:
SELECT sub_account,
name,
CASE WHEN split_index=0 THEN MyList ELSE SUBSTR(MyList,1,LOCATE('|',MyList)-1) END
FROM (select sub_account, name, LISTAGG(email,'|') as MyList, LOCATE('|',LISTAGG(LB_ARTICLE_CAISSE,'|')) AS split_index
from TABLE
group by sub_account, name) AS TABLEA
This function will aggregate your mail and after split it and take the first one

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How can I dynamically choose which column to join on in SQL? - tsql

Related

How to use COUNT() in more that one column?

Select only the rows with the latest date in postgres

table with two nullable foreign keys join results into single column

Modifying Duplicates

How to group by in DB2 IBM and get the first item in each group?

Categories

Resources