Find rows where they are "similar" and in which columns - tsql

I'm quite new with SQL Server (2017) and i've this kind of need:
Consider this record:
╔═════════════╦═══════════════╦══════════════╦═══════╗
║ Surname ║ Name ║ Day of birth ║ City ║
╠═════════════╬═══════════════╬══════════════╬═══════╣
║ Rivers Lara ║ Wanda Leticia ║ 07/04/1956 ║ Paris ║
╚═════════════╩═══════════════╩══════════════╩═══════╝
I've to find all the matching records in following list highlighting the type of matching:
╔═════════════╦═══════════════╦══════════════╦════════╗
║ Surname ║ Name ║ Day of birth ║ City ║
╠═════════════╬═══════════════╬══════════════╬════════╣
║ Rivers Lara ║ Wanda Leticia ║ 07/04/1956 ║ London ║
║ Rivers  ║ Leticia ║ 07/04/1956 ║ Rome ║
║ Rivers  ║ Leticia ║ 14/03/1995 ║ Rome ║
║ Rivers Lara ║ Leticia ║ 07/04/1956 ║ Paris ║
║ Rivers Lara ║ Wanda Leticia ║ 08/07/1983 ║ Paris ║
╚═════════════╩═══════════════╩══════════════╩════════╝
For example:
1st row is matching for Surname+Name+dayofbirth
2nd for Part of Surname+Part of Name+dayofbirth
3rd for Part of Surname+Part of Name
4th for Surname+Part of Name+dayofbirth+City
and so on...
Any ideas on how to approach this type of query will be appreciated considering also that at the moment we have fixed number of possible matching but in the future they could increase (maybe adding more columns like Tax number or other).

assuming the presentation layer is html and your ok with bits of html in query output, this is a rough idea, though working it's not precisely efficient and no partial matches, only exact. to match partial you'll need to use charindex() or patindex() and split on ' ' with left() or right(), can get convoluted.
one split for left/right word is like, at least this is the way I do splitting still.
--this is only an example on the convoluted nature of string manipulation.
declare #Surname varchar(128) = 'Rivers Lara';
select
rtrim(iif(charindex(' ',#Surname) = 0,#Surname,Left(#Surname, charindex(' ',#Surname)))) first_part_Surname
,ltrim(reverse(iif(charindex(' ',#Surname) = 0,reverse(#Surname),Left(reverse(#Surname), charindex(' ',reverse(#Surname)))))) last_part_Surname
declare #StartRed varchar(50) = '<span style="color: red;">'
,#StopRed varchar(50) = '</span>';
select
case when tm.Surname = tr.Surname then #StartRed + tr.Surname + #StopRed else tr.Surname end Surname
,case when tm.Name = tr.Name then #StartRed + tr.Name + #StopRed else tr.Name end [Name]
,case when tm.[Day of Birth] = tr.[Day of Birth] then #StartRed + convert(varchar, tr.[Day of Birth], 1) + #StopRed end [Day of Birth]
,case when tm.City = tr.City then #StartRed + tr.City + #StopRed else tr.City end City
from TableMatch tm
inner join TableRecords tr on (tm.Surname = tr.Surname or tm.Name = tr.Name)
and (tm.[Day of Birth] = tr.[Day of Birth] or tm.City = tr.City)
-- requires either Surname or Name to match and at least 1 of the 2 others to match
additionally, you may be able to use soundex() to find names that sound like other names as a stop-gap without any manipulation. you can also Left() the soundex() value to get broader and broader matches, though you'll end up with all names that start with a the first letter if you goto left(soundex(name),1) matches.

In addition to Andrew's comment, you can also approach it with a single self-join with an OR-linked condition to each column you want to check:
ON Col1=Col1
OR Col2=Col2
OR Col3=Col3
etc...
Then the extra column you want with the kind of matching would be a massive CASE expression with a WHEN..THEN for every possible combination that you would want to see in this column.
WHEN Col1=Col1 and Col2<>Col2 and Col3<>Col3 THEN 'Col1'
WHEN Col1=Col1 and Col2=Col2 and Col3<>Col3 THEN 'Col1, Col2'
etc...
In the above example, I am assuming none of the columns can contain NULL, but if they can you'll have to handle that in the logic as well.

Related

PostgreSQL - make two selects in two columns

How can i add order, group by and limit for each arr in query?
users table:
╔════╦══════════════╦
║ id ║ name ║
╠════╬══════════════╬
║ 1 ║ Jeff Atwood ║
║ 2 ║ Geoff Dalgas ║
║ 3 ║ Jarrod Dixon ║
║ 4 ║ Joel Spolsky ║
╚════╩══════════════╩
Output query example without filter and limit:
SELECT JSON_AGG(u) filter (where id > 1) as arr1,
JSON_AGG(u) filter (where id < 3) as arr2
FROM users u
Expected:
╔═══════════════════╦═════════════════════╦
║ arr1 ║ arr2 ║
╠═══════════════════╬═════════════════════╬
║ [{id:1, name: ''},║ [{id:1, name: ''}, ║
║ {id:2, name: ''}] ║ {id:2, name: ''}] ║
╚═══════════════════╩══════════════════════
Query arguments example for one array:
SELECT *
FROM ps
LEFT JOIN u on u.id = ps.id
WHERE ps.date <= now()
GROUP BY ps.id
ORDER BY ps.date DESC
Order BY
You can put order by with in aggregate function like this:
SELECT json_agg(users.* order by id desc) FILTER (WHERE id > 1 ) AS arr1,
json_agg(users.*) FILTER (WHERE id < 2) AS arr2
FROM users;
As per my understanding LIMIT is not allowed to use in this way i.e. array wise

MARIADB variables in SELECT statements POSTGRES equivelent

In mariadb/mysql I can use variables in the following way to perform calculations in a select statement. In this simple instance, I create a range of dates and use variables to calculate a simple opening balance and closing balance, with a payment of 10 each day.
with RECURSIVE dates as (
select '2017-11-01' as `dt`
union all
select adddate(dt, INTERVAL 1 DAY)
from dates
where dt < CURDATE()
)
select
#vardate:=d.dt
, #openbal
, #payment:= 10
, #closebal:= #openbal+#payment
, #openbal:=#closebal
from dates d;
gives the results....
╔══════════════╦════════════╦═══════════════╦═════════════╗
║ "ac_date" ║ "open_bal" ║ "trans_total" ║ "close_bal" ║
╠══════════════╬════════════╬═══════════════╬═════════════╣
║ "2017-11-01" ║ "0" ║ "10" ║ "10" ║
║ "2017-11-02" ║ "10" ║ "10" ║ "20" ║
║ "2017-11-03" ║ "20" ║ "10" ║ "30" ║
...
Using this technique I can perform simple calculations on the fly in a select statement. My question is, is it possible to use variables in this way in a PL/pgSQL function or is there an alternative method I am overlooking?
I am not entirely sure how that statement works in MariaDB, but this seems to do the same thing:
with vars (openbal, payment) as (
values (0, 10)
), balance as (
select t.dt::date as ac_date,
openbal,
payment,
sum(payment) over (order by t.dt) as close_bal
from vars,
generate_series(date '2017-11-01', current_date, interval '1' day ) as t(dt)
)
select ac_date,
openbal + lag(close_bal) over (order by ac_date) as open_bal,
payment,
close_bal
from balance;
In general to get a running total you use sum() over (order by ...) in SQL. To access values from a previous row, you use the lag() function.
The two CTEs are needed because window functions can't be nested.
To generate a list of rows, use generate_series() in Postgrse.

complex SQL INSERT not working

I am working on migrating data from 1 row by X columns in one table to X rows in another table. In the first table, multiple boolean records AF.AdditionalFieldsBoolean15 (and 14, 13, 12, etc) are stored in a single row. I need to create a single row to represent each 'true' boolean column.
I have been working with the following code, which executes flawlessly, except that it only inserts 1 record. When I run the SELECT statement in the second code block, I return 12 records. How do I ensure that I am iterating over all records?
BEGIN TRAN TEST
DECLARE #id uniqueidentifier = NEWID()
DECLARE #dest uniqueidentifier = 'AC34C8E5-8859-4E74-ACF2-54B3804AE9C9'
DECLARE #person uniqueidentifier
SELECT #person = GP.Oid FROM <Redacted>.GenericPerson AS GP
INNER JOIN <Redacted>.AdditionalFields AS AF
ON GP.AdditionalFields = AF.Oid
WHERE AF.AdditionalBoolean15 = 'true'
INSERT INTO <Redacted>.Referral (Oid, Destination, Person)
OUTPUT INSERTED.*
VALUES (#id, #dest, #person)
Select statement that returns 12 records
SELECT *
FROM WS_Live.dbo.GenericPerson AS GP
INNER JOIN WS_Live.dbo.AdditionalFields AS AF
ON GP.AdditionalFields = AF.Oid
WHERE AF.AdditionalBoolean15 = 'true'
|
|
|
|
|
|
--------------SOLUTION (EDIT)--------------------
Thanks to M.Ali I was able to muddle through pivot tables and worked out the following solution. Just wanted to post with some explanation in case anyone needs this in the future.
INSERT INTO <Redacted>.Referral (Person, Destination, Oid)
OUTPUT INSERTED.*
SELECT Person
, Destination
, NEWID() AS Oid
FROM
(
SELECT
GP.Oid AS Person,
AF.AdditionalBoolean15 AS 'AC34C8E5-8859-4E74-ACF2-54B3804AE9C9',
AF.AdditionalBoolean14 AS '7DE4B414-42E0-4E39-9432-6DC9F60A5512',
AF.AdditionalBoolean8 AS '5760A126-AD15-4FF4-B608-F1C4220C7087',
AF.AdditionalBoolean13 AS '4EFFB0FB-BB6C-4425-9653-D482B6C827AC',
AF.AdditionalBoolean17 AS '0696C571-EEFA-4FE6-82DA-4FF6AB96CC98',
AF.AdditionalBoolean4 AS 'FF381D63-A76C-46F1-8E2C-E2E3C69365BF',
AF.AdditionalBoolean20 AS 'C371E419-4E34-4F46-B07D-A4533491D944',
AF.AdditionalBoolean16 AS '1F0D1221-76D7-4F1F-BB7A-818BB26E0590',
AF.AdditionalBoolean18 AS 'C6FD53A8-37B9-4519-A825-472722A158C9',
AF.AdditionalBoolean19 AS 'BEBD6ED6-AF0A-4A05-A1C1-060B2926F83E'
FROM <Redacted>.GenericPerson GP
INNER JOIN <Redacted>.AdditionalFields AF
ON GP.AdditionalFields = AF.Oid
)AS cp
UNPIVOT
(
Bool FOR Destination IN ([AC34C8E5-8859-4E74-ACF2-54B3804AE9C9],
[7DE4B414-42E0-4E39-9432-6DC9F60A5512],
[5760A126-AD15-4FF4-B608-F1C4220C7087],
[4EFFB0FB-BB6C-4425-9653-D482B6C827AC],
[0696C571-EEFA-4FE6-82DA-4FF6AB96CC98],
[FF381D63-A76C-46F1-8E2C-E2E3C69365BF],
[C371E419-4E34-4F46-B07D-A4533491D944],
[1F0D1221-76D7-4F1F-BB7A-818BB26E0590],
[C6FD53A8-37B9-4519-A825-472722A158C9],
[BEBD6ED6-AF0A-4A05-A1C1-060B2926F83E])
)AS up
WHERE Bool = 'true'
ORDER BY Person, Destination
First of all, I'm not sure why this SELECT NEWID() at the top worked, where I have received errors when trying to SELECT NEWID() before.
I felt a little creative about using statements like
AF.AdditionalBoolean19 AS 'BEBD6ED6-AF0A-4A05-A1C1-060B2926F83E'
because the table I was inserting into required a GUID from another table that represented a plaintext 'Name'. There is no table linking each column name to that GUID, so I think this was the best way, but I would like to hear if anyone can think of a better way.
A demo how you can unpivot your AdditionalBooleanN columns and rather then doing it row by row just use where clause to filter result and insert into intended destination tables.
Test Data
DECLARE #TABLE TABLE
(ID INT , dest INT, Person INT, Bol1 INT, Bol2 INT, Bol3 INT)
INSERT INTO #TABLE VALUES
(1 , 100 , 1 , 1 , 1 , 1) ,
(2 , 200 , 2 , 1 , 1 , 0) ,
(3 , 300 , 3 , 1 , 0 , 0) ,
(4 , 400 , 4 , 0 , 0 , 0)
Query
-- INSERT INTO Destination_Table (ID , Dest, Person, bol_Column)
SELECT * --<-- Only select columns that needs to be inserted
FROM #TABLE t
UNPIVOT ( Value FOR Bool_Column IN (Bol1, Bol2, Bol3) )up
-- WHERE Bool_Column = ??
Result
╔════╦══════╦════════╦═══════╦═════════════╗
║ ID ║ dest ║ Person ║ Value ║ Bool_Column ║
╠════╬══════╬════════╬═══════╬═════════════╣
║ 1 ║ 100 ║ 1 ║ 1 ║ Bol1 ║
║ 1 ║ 100 ║ 1 ║ 1 ║ Bol2 ║
║ 1 ║ 100 ║ 1 ║ 1 ║ Bol3 ║
║ 2 ║ 200 ║ 2 ║ 1 ║ Bol1 ║
║ 2 ║ 200 ║ 2 ║ 1 ║ Bol2 ║
║ 2 ║ 200 ║ 2 ║ 0 ║ Bol3 ║
║ 3 ║ 300 ║ 3 ║ 1 ║ Bol1 ║
║ 3 ║ 300 ║ 3 ║ 0 ║ Bol2 ║
║ 3 ║ 300 ║ 3 ║ 0 ║ Bol3 ║
║ 4 ║ 400 ║ 4 ║ 0 ║ Bol1 ║
║ 4 ║ 400 ║ 4 ║ 0 ║ Bol2 ║
║ 4 ║ 400 ║ 4 ║ 0 ║ Bol3 ║
╚════╩══════╩════════╩═══════╩═════════════╝

Tsql denormalize a table with text values

I have a table like
UserID Attribute Value
1 Name Peter
1 Sex male
1 Nationality UK
2 Name Sue
And need a result like
UserId Name Sex Nationality .....
1 Peter male Uk
2 Sue .....
Looks like crosstab - in MS Access that works if I take First(Value) - in T SQL I cannot get it to work with Value being a text field
Any ideas?
DECLARE #TABLE TABLE (UserID INT, Attribute VARCHAR(20),Value VARCHAR(20))
INSERT INTO #TABLE VALUES
(1,'Name','Peter'),
(1,'Sex','male'),
(1,'Nationality','UK'),
(2,'Name','Sue')
SELECT * FROM #TABLE
PIVOT ( MAX(Value)
FOR Attribute
IN ([Name], [Sex], [Nationality]))P
Result Set
╔════════╦═══════╦══════╦═════════════╗
║ UserID ║ Name ║ Sex ║ Nationality ║
╠════════╬═══════╬══════╬═════════════╣
║ 1 ║ Peter ║ male ║ UK ║
║ 2 ║ Sue ║ NULL ║ NULL ║
╚════════╩═══════╩══════╩═════════════╝

PostgreSQL - How to replace empty field with any string?

I'm trying to replace some empty (NULL) fields, which I get as a result of my query, with any string I want. Those empty fields are placed in a "timestamp without timezone" column. So I tried to use COALESCE function, but no result (I got error: invalid input syntax for timestamp: "any_string":
select column1, coalesce(date_trunc('seconds', min(date)), 'any_string') as column2
What could be wrong?
Table:
╔════╦═════════════════════╦═════════════════════╗
║ id ║ date ║ date2 ║
╠════╬═════════════════════╬═════════════════════╣
║ 1 ║ 2013-12-17 13:54:59 ║ 2013-12-17 09:03:31 ║
║ 2 ║ 2013-12-17 13:55:07 ║ 2013-12-17 09:59:11 ║
║ 3 ║ 2013-12-17 13:55:56 ║ empty field ║
║ 4 ║ 2013-12-17 13:38:37 ║ 2013-12-17 09:14:01 ║
║ 5 ║ 2013-12-17 13:54:46 ║ empty field ║
║ 6 ║ 2013-12-17 13:54:46 ║ empty field ║
║ 7 ║ 2013-12-17 13:55:40 ║ empty field ║
╚════╩═════════════════════╩═════════════════════╝
Sample query:
select q1.id, q2.date, q3.date2
from (select distinct id from table1) q1
left join (select id, date_trunc('seconds', max(time)) as date from table2 where time::date = now()::date group by id) q2 on q1.id = q2.id
left join (select id, date_trunc('seconds', min(time2)) as date2 from table1 where time2:date = now()::date group by id) q3 on q1.id = q3.id
order by 1
And the matter is to replace those empty field above with any string I imagine.
You can simply cast timestamp to text using ::text
select column1, coalesce(date_trunc('seconds', min(date))::text, 'any_string') as column2
The date_trunc() function returns a timestamp, thus you cannot fit a string like any_string in the same column.
You'll have to pick a format and convert the resulting date to string, though of course it'll no longer be usable as date.
the coalesce function only works on the integer data type. It will not work on any other data type .
In one of cases I used to convert a varchar data type to integer inside the coalesce function by using _columnName_ :: integer syntax .
But in your case i dont think so that time stamp will be converted to the integer data type.