Checking two values from two tables for duplicates

Checking two values from two tables for duplicates - tsql

I have an existing check that looks at the Name table to check for duplicate names (Full_Name), but how can I check for Name and Address ? Full_Address lives in the address table and when I try to combine these two values to check against the DB as a single value everything breaks.
Select Name.ID, Name.Full_Name, Concat(Name.Full_Name,' ', Address.FULL_ADDRESS) as Comb
FROM Name INNER JOIN Address ON Name.ID = Address.ID
Where Full_Name != '' AND having count(Comb)>1
group by Full_Name

DECLARE #Name TABLE (ID INT, Full_Name NVARCHAR(50))
DECLARE #Address TABLE (ID INT, FULL_ADDRESS NVARCHAR(100))
INSERT INTO #Name VALUES
(1,'Alex Zoolittle')
,(2,'Brian Yakami')
,(3,'Charles Xylogon')
,(4,'Brian Yakami')
INSERT INTO #Address VALUES
(1,'123 Westwood Way, Los Angeles, CA 95043')
,(2,'234 Eastwood Lane, Los Gatos, CA 95030')
,(3,'345 Northwood Blvd, Los Alamos, NM 83241')
,(4,'234 Eastwood Lane, Los Gatos, CA 95030')
;WITH Comb
AS (
SELECT na.ID, na.Full_Name, CONCAT(na.Full_Name,' ', ad.FULL_ADDRESS) AS Comb,
ROW_NUMBER() OVER(PARTITION BY CONCAT(na.Full_Name,' ', ad.FULL_ADDRESS) ORDER BY na.ID) AS Row
FROM #Name na
INNER JOIN #Address ad
ON na.ID = ad.ID
WHERE Full_Name != ''
)
SELECT ID, Full_Name, Comb FROM Comb
WHERE Row = 1

Related

How to extract the names of unique columns of a table in PostgreSQL?

Let's suppose I have a table in PorstreSQL defined as:
CREATE TABLE my_table (
id serial not null primary key,
var1 text null,
var2 text null unique,
var3 text null,
var4 text null unique
);
Is there a query to information_schema that provides the names of unique columns only? The desirable response should be:
var2
var4
The query should ignore unique keys for multiple columns together.

You need information_schema.table_constraints and information_schema.constraint_column_usage:
SELECT table_schema, table_name, column_name
FROM information_schema.table_constraints AS c
JOIN information_schema.constraint_column_usage AS cc
USING (table_schema, table_name, constraint_name)
WHERE c.constraint_type = 'UNIQUE';
If you want to skip constraints with more than one column, use grouping:
SELECT table_schema, table_name, min(column_name)
FROM information_schema.table_constraints AS c
JOIN information_schema.constraint_column_usage AS cc
USING (table_schema, table_name, constraint_name)
WHERE c.constraint_type = 'UNIQUE'
GROUP BY table_schema, table_name
HAVING count(*) = 1;

aggregate multiple columns over dynamic pivot in sql

I'm creating a stored procedure that would allow the user to retrieve data from 2 tables by providing the PersonID number as a parameter.
I thought of using the pivot function to pivot the Data table dynamically by non-aggregating over multiple columns and retrieving data from ONE column in a different table. The 2 tables below are just sample data as I have over 100 columns for the data table, hence the dynamic part. The 2 tables doesn't have a common ID column but just a common column_name.
Here are the 2 tables:
Mapping Table:
CREATE table #table (
ID varchar(10) NOT NULL,
Column_Name varchar (255) NOT NULL,
Page_Num varchar(10) NOT NULL,
Line_Num varchar(10) NOT NULL,
Element_Num varchar(10) NOT NULL
)
INSERT INTO #table (ID,Column_Name,Page_Num,Line_Num,Element_Num) VALUES ('1','Name', 'DT-01', '200','20')
INSERT INTO #table (ID,Column_Name,Page_Num,Line_Num,Element_Num) VALUES ('2','SSN', 'DT-02', '220','10')
INSERT INTO #table (ID,Column_Name,Page_Num,Line_Num,Element_Num) VALUES ('3','City', 'DT-03', '300','11')
INSERT INTO #table (ID,Column_Name,Page_Num,Line_Num,Element_Num) VALUES ('4','StreetName', 'DT-04', '350','33')
INSERT INTO #table (ID,Column_Name,Page_Num,Line_Num,Element_Num) VALUES ('5','Sex', 'DT-05', '310','51')
Creates:
ID Column_Name Page_Num Line_Num Element_Num
_________________________________________________________________
1 Name DT-01 200 20
2 SSN DT-02 220 10
3 City DT-03 300 11
4 StreetName DT-04 350 33
5 Sex DT-05 310 51
Data table:
CREATE table #temp (
PersonID varchar (100) NOT NULL,
Name varchar(100) NOT NULL,
SSN varchar (255) NOT NULL,
City varchar(100) NOT NULL,
StreetName varchar(100) NOT NULL,
Sex varchar(100) NOT NULL
)
INSERT INTO #temp (PersonID,Name,SSN,City,StreetName,Sex) VALUES ('112','Joe','945890189', 'Lookesville', 'Broad st','Male')
INSERT INTO #temp (PersonID,Name,SSN,City,StreetName,Sex) VALUES ('140','Santana','514819926', 'Falls Church', 'Gane Rd', 'Female')
INSERT INTO #temp (PersonID,Name,SSN,City,StreetName,Sex) VALUES ('481','Wyatt','014523548','Gainesville', 'Westfield blvd', 'Male')
INSERT INTO #temp (PersonID,Name,SSN,City,StreetName,Sex) VALUES ('724','Brittany','551489230','Aldi', 'Ostrich rd', 'Female')
INSERT INTO #temp (PersonID,Name,SSN,City,StreetName,Sex) VALUES ('100','Giovanni','774451362','Paige', 'Company ln', 'Male')
Creates:
PersonID Name SSN City StreetName Sex
_______________________________________________________________________
112 Joe 945890189 Lookesville Broad st Male
140 Santana 514819926 Falls Church Gane Rd Female
481 Wyatt 014523548 Gainesville Westfield rd Male
724 Brittany 551489230 Aldi Ostrich rd Female
100 Giovanni 774451362 Paige Company ln Male
The end result should be:
Example: User enters parameter PersonID = 140
Column_name Page_Num Line_Num Element_Num Data
_____________________________________________________________________________
Name DT-01 200 20 Santana
SSN DT-02 220 10 514819926
City DT-03 300 11 Falls Church
StreetName DT-04 350 33 Gane Rd
Sex DT-05 310 51 Female
... ... ... ... ...
and so on..

The following will dynamically unpivot a data row, and then perform a join on the field name with the def data.
If you want to run this query without a filter, I would suggest adding A.PersonID to the top SELECT and remove the WHERE
I should add, UNPIVOT would be more performant, but with this approach, there is no need to define and/or recast values. That said, the performance is still very respectable.
Example
Select D.*
,Data=C.Value
From #Temp A
Cross Apply (Select XMLData = cast((Select A.* For XML Raw) as xml)) B
Cross Apply (
Select Item = attr.value('local-name(.)','varchar(100)')
,Value = attr.value('.','varchar(max)')
From B.XMLData.nodes('/row') as X(r)
Cross Apply X.r.nodes('./#*') AS N(attr)
) C
Join #Table D on (C.Item=D.Column_Name)
Where PersonID=140
Returns
If it Helps with the Visualization, the CROSS APPLY C generates the following:
EDIT - As a Stored Procedure
CREATE PROCEDURE [dbo].[YourProcedureName](#PersonID int)
As
Begin
Set NoCount On;
Select D.*
,Data=C.Value
From YourPersonTableName A
Cross Apply (Select XMLData = cast((Select A.* For XML Raw) as xml)) B
Cross Apply (
Select Item = attr.value('local-name(.)','varchar(100)')
,Value = attr.value('.','varchar(max)')
From B.XMLData.nodes('/row') as X(r)
Cross Apply X.r.nodes('./#*') AS N(attr)
) C
Join YourObjectTableName D on (C.Item=D.Column_Name)
Where PersonID=#PersonID
End

How to use Common Table Expression with parameters?

I have a stored procedure with 2 CTEs. The second CTE has a parameter
WITH path_sequences
AS
(
),
WITH categories
AS
(
... WHERE CategoryId = #CategoryId
// I dont know how to get this initial parameter inside the CTE
)
SELECT * FROM path_sequences p
JOIN categories c
ON p.CategoryId = c.CategoryId
The initial parameter that I need to get inside the second TCE is p.CategoryId. How do I do that without having to create another stored procedure to contain the second CTE?
Thanks for helping

You can create table valued function
create function ftCategories
(
#CategoryID int
)
returns table
as return
with categories as (
... WHERE CategoryId = #CategoryId
)
select Col1, Col2 ...
from categories
and use it as
SELECT *
FROM path_sequences p
cross apply ftCategories(p.CategoryId) c

I have created simple query using your code. You can use it like -
DECLARE #CategoryId INT
SET #CategoryId = 1
;WITH path_sequences
AS
(
SELECT 1 CategoryId
),
categories
AS
(
SELECT 1 CategoryId WHERE 1 = #CategoryId
)
SELECT * FROM path_sequences p
JOIN categories c
ON p.CategoryId = c.CategoryId

This syntax is for External Aliases:
-- CTES With External Aliases:
WITH Sales_CTE (SalesPersonID, SalesOrderID, SalesYear)
AS
-- Define the CTE query.
(
SELECT SalesPersonID, SalesOrderID, YEAR(OrderDate) AS SalesYear
FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL
)
The only way to add parameters is to use scope variables like so:
--Declare a variable:
DECLARE #category INT
WITH
MyCTE1 (exName1, exName2)
AS
(
SELECT <SELECT LIST>
FROM <TABLE LIST>
--Use the variable as 'a parameter'
WHERE CategoryId = #CategoryId
)

First remove the second WITH, separate each cte with just a comma. Next you can add parameters like this:
DECLARE #category INT; -- <~~ Parameter outside of CTEs
WITH
MyCTE1 (col1, col2) -- <~~ were poorly named param1 and param2 previously
AS
(
SELECT blah blah
FROM blah
WHERE CategoryId = #CategoryId
),
MyCTE2 (col1, col2) -- <~~ were poorly named param1 and param2 previously
AS
(
)
SELECT *
FROM MyCTE2
INNER JOIN MyCTE1 ON ...etc....
EDIT (and CLARIFICATION):
I have renamed the columns from param1 and param2 to col1 and col2 (which is what I meant originally).
My example assumes that each SELECT has exactly two columns. The columns are optional if you want to return all of the columns from the underlying query AND those names are unique. If you have more or less columns than what is being SELECTed you will need to specify names.
Here is another example:
Table:
CREATE TABLE Employee
(
Id INT NOT NULL IDENTITY PRIMARY KEY CLUSTERED,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
ManagerId INT NULL
)
Fill table with some rows:
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Donald', 'Duck', 5)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Micky', 'Mouse', 5)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Daisy', 'Duck', 5)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Fred', 'Flintstone', 5)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Darth', 'Vader', null)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Bugs', 'Bunny', null)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Daffy', 'Duck', null)
CTEs:
DECLARE #ManagerId INT = 5;
WITH
MyCTE1 (col1, col2, col3, col4)
AS
(
SELECT *
FROM Employee e
WHERE 1=1
AND e.Id = #ManagerId
),
MyCTE2 (colx, coly, colz, cola)
AS
(
SELECT e.*
FROM Employee e
INNER JOIN MyCTE1 mgr ON mgr.col1 = e.ManagerId
WHERE 1=1
)
SELECT
empsWithMgrs.colx,
empsWithMgrs.coly,
empsWithMgrs.colz,
empsWithMgrs.cola
FROM MyCTE2 empsWithMgrs
Notice in the CTEs the columns are being aliased. MyCTE1 exposes columns as col1, col2, col3, col4 and MyCTE2 references MyCTE1.col1 when it references it. Notice the final select uses MyCTE2's column names.
Results:

For anyone still struggling with this, the only thing you need to is terminate your declaration of variables with a semicolon before the CTE. Nothing else is required.
DECLARE #test AS INT = 42;
WITH x
AS (SELECT #test AS 'Column')
SELECT *
FROM x
Results:
Column
-----------
42
(1 row affected)

one column split to more column sql server 2008?

Table name: Table1
id name
1 1-aaa-14 milan road
2 23-abcde-lsd road
3 2-mnbvcx-welcoome street
I want the result like this:
Id name name1 name2
1 1 aaa 14 milan road
2 23 abcde lsd road
3 2 mnbvcx welcoome street

This function ought to give you what you need.
--Drop Function Dbo.Part
Create Function Dbo.Part
(#Value Varchar(8000)
,#Part Int
,#Sep Char(1)='-'
)Returns Varchar(8000)
As Begin
Declare #Start Int
Declare #Finish Int
Set #Start=1
Set #Finish=CharIndex(#Sep,#Value,#Start)
While (#Part>1 And #Finish>0)Begin
Set #Start=#Finish+1
Set #Finish=CharIndex(#Sep,#Value,#Start)
Set #Part=#Part-1
End
If #Part>1 Set #Start=Len(#Value)+1 -- Not found
If #Finish=0 Set #Finish=Len(#Value)+1 -- Last token on line
Return SubString(#Value,#Start,#Finish-#Start)
End
Usage:
Select ID
,Dbo.Part(Name,1,Default)As Name
,Dbo.Part(Name,2,Default)As Name1
,Dbo.Part(Name,3,Default)As Name2
From Dbo.Table1
It's rather compute-intensive, so if Table1 is very long you ought to write the results to another table, which you could refresh from time to time (perhaps once a day, at night).
Better yet, you could create a trigger, which automatically updates Table2 whenever a change is made to Table1. Assuming that column ID is primary key:
Create Table Dbo.Table2(
ID Int Constraint PK_Table2 Primary Key,
Name Varchar(8000),
Name1 Varchar(8000),
Name2 Varchar(8000))
Create Trigger Trigger_Table1 on Dbo.Table1 After Insert,Update,Delete
As Begin
If (Select Count(*)From Deleted)>0
Delete From Dbo.Table2 Where ID=(Select ID From Deleted)
If (Select Count(*)From Inserted)>0
Insert Dbo.Table2(ID, Name, Name1, Name2)
Select ID
,Dbo.Part(Name,1,Default)
,Dbo.Part(Name,2,Default)
,Dbo.Part(Name,3,Default)
From Inserted
End
Now, do your data manipulation (Insert, Update, Delete) on Table1, but do your Select statements on Table2 instead.

The below solution uses a recursive CTE for splitting the strings, and PIVOT for displaying the parts in their own columns.
WITH Table1 (id, name) AS (
SELECT 1, '1-aaa-14 milan road' UNION ALL
SELECT 2, '23-abcde-lsd road' UNION ALL
SELECT 3, '2-mnbvcx-welcoome street'
),
cutpositions AS (
SELECT
id, name,
rownum = 1,
startpos = 1,
nextdash = CHARINDEX('-', name + '-')
FROM Table1
UNION ALL
SELECT
id, name,
rownum + 1,
nextdash + 1,
CHARINDEX('-', name + '-', nextdash + 1)
FROM cutpositions c
WHERE nextdash < LEN(name)
)
SELECT
id,
[1] AS name,
[2] AS name1,
[3] AS name2
/* add more columns here */
FROM (
SELECT
id, rownum,
part = SUBSTRING(name, startpos, nextdash - startpos)
FROM cutpositions
) s
PIVOT ( MAX(part) FOR rownum IN ([1], [2], [3] /* extend the list here */) ) x
Without additional modifications this query can split names consisting of up to 100 parts (that's the default maximum recursion depth, which can be changed), but can only display no more than 3 of them. You can easily extend it to however many parts you want it to display, just follow the instructions in the comments.

select T.id,
substring(T.Name, 1, D1.Pos-1) as Name,
substring(T.Name, D1.Pos+1, D2.Pos-D1.Pos-1) as Name1,
substring(T.Name, D2.Pos+1, len(T.name)) as Name2
from Table1 as T
cross apply (select charindex('-', T.Name, 1)) as D1(Pos)
cross apply (select charindex('-', T.Name, D1.Pos+1)) as D2(Pos)
Testing performance of suggested solutions
Setup:
create table Table1
(
id int identity primary key,
Name varchar(50)
)
go
insert into Table1
select '1-aaa-14 milan road' union all
select '23-abcde-lsd road' union all
select '2-mnbvcx-welcoome street'
go 10000
Result:

if you always will have 2 dashes, you can do the following by using PARSENAME
--testing table
CREATE TABLE #test(id INT, NAME VARCHAR(1000))
INSERT #test VALUES(1, '1-aaa-14 milan road')
INSERT #test VALUES(2, '23-abcde-lsd road')
INSERT #test VALUES(3, '2-mnbvcx-welcoome street')
SELECT id,PARSENAME(name,3) AS name,
PARSENAME(name,2) AS name1,
PARSENAME(name,1)AS name2
FROM (
SELECT id,REPLACE(NAME,'-','.') NAME
FROM #test)x
if you have dots in the name column you have to first replace them and then replace them back to dots in the end
example, by using a tilde to substitute the dot
INSERT #test VALUES(3, '5-mnbvcx-welcoome street.')
SELECT id,REPLACE(PARSENAME(name,3),'~','.') AS name,
REPLACE(PARSENAME(name,2),'~','.') AS name1,
REPLACE(PARSENAME(name,1),'~','.') AS name2
FROM (
SELECT id,REPLACE(REPLACE(NAME,'.','~'),'-','.') NAME
FROM #test)x

TSQL not generating a new value per row

I'm trying to anonymize all the data in my database, so I'm renaming all the people in it. I asked a similar question earlier, and was told to use NewID to force the creation of a new value per updated row, but in this situation it doesn't seem to be working.
What am I doing wrong?
-- Create Table Customer
CREATE TABLE #FirstName
(
ID int,
FirstName nvarchar(255) NULL,
Gender nvarchar(255) NULL
)
CREATE TABLE #LastName (
ID int,
LastName nvarchar(255)
)
-- BULK INSERT to import data from Text or CSV File
BULK INSERT #FirstName
FROM 'C:\Users\jhollon\Desktop\tmp\names\firstnames.lined.txt'
WITH
(
FIRSTROW = 1,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
BULK INSERT #LastName
FROM 'C:\Users\jhollon\Desktop\tmp\names\lastnames.lined.txt'
WITH
(
FIRSTROW = 1,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
/*SELECT FirstName FROM #FirstName WHERE ID = (
SELECT RandomNumber FROM (
SELECT ABS(CHECKSUM(NewID())) % 1500 AS RandomNumber FROM tblTenant WHERE Sex = '1'
) AS A
);*/
UPDATE tblTenant SET TenantName = (
SELECT LastName + ', ' + FirstName FROM
(SELECT UPPER(FirstName) as FirstName FROM #FirstName WHERE ID = (SELECT ABS(CHECKSUM(NewID())) % 500 + 1501)) AS A,
(SELECT LastName FROM #LastName WHERE ID = (SELECT ABS(CHECKSUM(NewID())) % 200 + 1)) as B
) WHERE Sex = '2';
UPDATE tblTenant SET TenantName = (
SELECT LastName + ', ' + FirstName FROM
(SELECT UPPER(FirstName) as FirstName FROM #FirstName WHERE ID = (SELECT ABS(CHECKSUM(NewID())) % 500 + 1)) AS A,
(SELECT LastName FROM #LastName WHERE ID = (SELECT ABS(CHECKSUM(NewID())) % 200 + 1)) as B
) WHERE Sex = '1';
DROP TABLE #FirstName;
DROP TABLE #LastName;

Correct. The subquery is evaluated once which is as advertised ("cachable scalar subquery")
Try this which uses NEWID as a derived table
UPDATE T
SET
TenantName = L.LastName + ', ' + F.FirstName
FROM
tblTenant T
CROSS APPLY
(SELECT TOP 1 UPPER(FirstName) as FirstName FROM #FirstName
WHERE CHECKSUM(NEWID()) <> T.ID
ORDER BY NEWID()) F
CROSS APPLY
(SELECT TOP 1 LastName FROM #LastName
WHERE CHECKSUM(NEWID()) <> T.ID
ORDER BY NEWID()) L

I'm not sure I understand your question, but if you want the ID to be unique values, you can make it an identity column.
Ex:
[ID] [int] IDENTITY(1,1) NOT NULL

The code below demonstrates that without an inner to outer correlation, that the old name is not guaranteed to differ from the new name when using the CROSS APPLY answer above.
WHERE F.Id <> T.Id ORDER BY NEWID() would be better within the FirstName CROSS APPLY
USE tempdb
GO
IF OBJECT_ID('tblTenant') IS NOT NULL
DROP TABLE tblTenant
GO
CREATE TABLE tblTenant
(
Id int,
FirstName nvarchar(20),
LastName nvarchar(20),
Gender bit
)
INSERT INTO tblTenant
VALUES (1, 'Bob' , 'Marley', 1),
(2, 'Boz' , 'Skaggs', 1)
SELECT DISTINCT FirstName
INTO #FirstNames
FROM tblTenant
SELECT DISTINCT LastName
INTO #LastNames
FROM tblTenant
-- There is a probability > 0 that a tenant's new name = tenants old name
SELECT
OldFirst = T.FirstName,
OldLast = T.LastName,
NewFirst = F.FirstName,
NewLast = L.LastName
FROM
tblTenant T
CROSS APPLY
(
SELECT TOP 1 UPPER(FirstName) AS FirstName
FROM #FirstNames
WHERE CHECKSUM(NEWID()) <> T.ID
ORDER BY NEWID()
) F
CROSS APPLY
(
SELECT TOP 1 LastName
FROM #LastNames
WHERE CHECKSUM(NEWID()) <> T.ID
ORDER BY NEWID()
) L

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Checking two values from two tables for duplicates - tsql

Related

How to extract the names of unique columns of a table in PostgreSQL?

aggregate multiple columns over dynamic pivot in sql

How to use Common Table Expression with parameters?

one column split to more column sql server 2008?

TSQL not generating a new value per row

Categories

Resources