How to separate text using substring - postgresql

I was wondering how can I separate a column containing the following:
BURGER, Petrus (CHV 494081)
Into 3 columns:
FirstName, LastName, ID

SELECT
a[2] AS FirstName,
a[1] AS LastName,
a[3] AS ID
FROM (
SELECT regexp_matches(column_name, '(.+), (.+) \((.+)\)')
FROM table_name
) t(a)

Related

How to manage NULL strings or dates in sql queries (PostgreSQL)

PostgreSQL 11.1
With the below sql query, where $1 and $2 are strings and $3 is a timestamp, how can the below query be rewritten so that a null value in $3 allows for every date to be selected (not just null dates).
SELECT lastname, firstname, birthdate FROM patients
WHERE UPPER(lastname) LIKE UPPER($1)||'%' and UPPER(firstname) LIKE UPPER($2)||'%' AND birthdate::date = $3::date
UNION
SELECT lastname, firstname, birthdate FROM appointment_book
WHERE UPPER(lastname) LIKE UPPER($1)||'%' and UPPER(firstname) LIKE UPPER($2)||'$' and birthdate::date = $3::date
That is, if $3 is null, then this should reduce to:
SELECT lastname, firstname, birthdate FROM patients
WHERE UPPER(lastname) LIKE UPPER($1)||'%' and UPPER(firstname) LIKE UPPER($2)||'%'
UNION
SELECT lastname, firstname, birthdate FROM appointment_book
WHERE UPPER(lastname) LIKE UPPER($1)||'%' and UPPER(firstname) LIKE UPPER($2)||'$'
Untested but I think you can handle that with a CASE expression
SELECT lastname, firstname, birthdate FROM patients p
WHERE UPPER(p.lastname) LIKE UPPER($1)||'%'
AND UPPER(p.firstname) LIKE UPPER($2)||'%'
AND (CASE WHEN $3 IS NULL THEN TRUE
ELSE p.birthdate::date = $3::date
END)
UNION
SELECT lastname, firstname, birthdate FROM appointment_book ab
WHERE UPPER(ab.lastname) LIKE UPPER($1)||'%'
AND UPPER(ab.firstname) LIKE UPPER($2)||'%'
AND (CASE WHEN $3 IS NULL THEN TRUE
ELSE ab.birthdate::date = $3::date
END);

UPDATE statement using two arrays at the same index in WHERE clause

I am trying to update a table, entities with a column, contacts that is an array of ids from another table, contacts. The contacts table has the columns first_name and last_name, and I have an array of first names, firstNames and last names, lastNames to pass in.
How would you update the contacts column in the entities table with one query that properly gets all of the contacts with first name firstNames[0] AND last name lastNames[0], and all of the contacts with first name firstNames[1] AND last name lastNames[1], and [...] all of the contacts with first name firstNames[n] AND last name lastNames[n]?
My initial thought was something like UPDATE entities SET contacts = (SELECT id FROM contacts WHERE first_name = ANY(firstNames) AND last_name = ANY(lastNames).
The problem with this arrises when the contacts table is like this:
first_name | last_name
----------------------
Bob | Jones
Bob | Miller
David | Miller
If I wanted to set the contacts column to the Ids for Bob Jones and David Miller, but NOT Bob Miller, and I passed in ['Bob', 'David'] for firstNames and ['Jones', 'Miller'] for lastNames in the above query, Bob Miller would also get added to the contacts column.
May be you look for something like this:
WITH x AS (
SELECT 'Bob'::text AS firstName, 'Jones'::text AS lastName
UNION SELECT 'David', 'Miller'
UNION SELECT 'Bob', 'Miller'
)
SELECT *
FROM x
WHERE (firstName, lastName) = ANY (ARRAY [
('Bob'::text, 'Jones'::text),
('David'::text, 'Miller'::text)
]);
Yet another way:
WITH x AS (
SELECT 'Bob'::text AS firstName, 'Jones'::text AS lastName
UNION SELECT 'David', 'Miller'
UNION SELECT 'Bob', 'Miller'
)
SELECT *
FROM x
WHERE EXISTS (
SELECT 1
FROM (SELECT ARRAY [
['Bob', 'Jones'],
['David', 'Miller']]::text[][] AS n
) AS n
JOIN LATERAL generate_series(1, array_upper(n, 1)) AS i ON true
WHERE firstName = n[i][1]
AND lastName = n[i][2]
);

Am able to parse the first and last name, from full name, how do I parse the Middle Name?

Am able to parse the first and last name, from full name, how do I parse the Middle Name? There are no titles used such as 'MR','MS','DR','FR', 'MRS','LRD','SIR', 'LORD','LADY','MISS','PROF so I think I can use the substring. The name format can be firstname middlename lastname, or firstname lastname, with the space in the middle.
UPDATE p
SET p.LAST_NAME = c.LASTNAME --tested that join is correct, contact name is combined, will need to parse it out ***, need to reference inserted
--Need FIRST_NAME, MIDDLE_NAME, LAST_NAME
p.FIRST_NAME = SUBSTRING(c.CONTACT, 1, CHARINDEX(' ', c.CONTACT) - 1) AS FirstName,
p.MIDDLE_NAME = --need middle name
p.LAST_NAME = SUBSTRING(CONTACT, CHARINDEX(' ', CONTACT) + 1, len(CONTACT)) AS LastName
FROM GMUnitTest.dbo.CONTACT1 c
JOIN PCUnitTest.dbo.PEOPLE p
ON p.PEOPLE_ID = c.KEY4
WHERE c.Key1 = '31';
Based on what you said, that there must be a middle name, you can use something like this:
declare #table table (fullName varchar(256))
insert into #table values
('First Middle Last'),
('John Mary-Lou Smith'),
('Frank NMN Sanatra')
select
CHARINDEX(' ',fullName,1)
,left(fullName,CHARINDEX(' ',fullName,1) - 1) as FirstName
,substring(fullName,CHARINDEX(' ',fullName,1) + 1,(len(fullName) - CHARINDEX(' ',fullName,1)) - charindex(' ',reverse(fullName),1)) as MiddleName
,right(fullName,charindex(' ',reverse(fullName),1)) as LastName
from
#table

Split fullname into first and last name

If I have a table with a column that contains fullnames such as:
fullname
------------
Joe Bloggs
Peter Smith
Mary Jones and Liz Stone
How can I retrieve the first and last name from each of the entries in the full name column using SQL. I'm not worried about the second name in the 3rd entry in my example i.e. Liz Stone.
So basically to retrieve
Firstname
---------
Joe
Peter
Mary
Lastname
--------
Bloggs
Smith
Jones
Here is a pre SQL Server 2016 method, which uses basic string functions to isolate the first and last names.
SELECT SUBSTRING(fullname, 1, CHARINDEX(' ', fullname) - 1) AS Firstname,
SUBSTRING(fullname,
CHARINDEX(' ', fullname) + 1,
LEN(fullname) - CHARINDEX(' ', fullname)) AS Lastname
FROM yourTable
Note that this solution assumes that the fullname column only contains a single first name and a single last name (i.e. no middle names, initials, etc.).
This is a slippery slope and there are no easy answers. That said, consider the following
Declare #YourTable table (FullName varchar(50))
Insert Into #YourTable values
('Joe Bloggs'),
('Peter Smith'),
('Betty Jane Martinez'),
('Mary Jones and Liz Stone')
Select A.*
,FirstName = Pos1+case when Pos3 is not null then ' '+Pos2 else '' end
,LastName = case when Pos3 is null then Pos2 else Pos3 end
From #YourTable A
Cross Apply (
Select Pos1 = xDim.value('/x[1]','varchar(max)')
,Pos2 = xDim.value('/x[2]','varchar(max)')
,Pos3 = xDim.value('/x[3]','varchar(max)')
,Pos4 = xDim.value('/x[4]','varchar(max)')
,Pos5 = xDim.value('/x[5]','varchar(max)')
,Pos6 = xDim.value('/x[6]','varchar(max)')
From (Select Cast('<x>' + replace((Select substring(FullName,1,charindex(' and ',FullName+' and ')-1) as [*] For XML Path('')),' ','</x><x>')+'</x>' as xml) as xDim) as A
) B
Returns
FullName FirstName LastName
Joe Bloggs Joe Bloggs
Peter Smith Peter Smith
Betty Jane Martinez Betty Jane Martinez
Mary Jones and Liz Stone Mary Jones
If it helps with the visual, the CROSS APPLY generates
SELECT CASE
WHEN CHARINDEX(' ', FullName) > 0
THEN SUBSTRING(FullName, 1, LEN(FullName) - CHARINDEX(' ', REVERSE(FullName)))
ELSE ''
END AS FirstName,
CASE
WHEN CHARINDEX(' ', FullName) > 0
THEN REVERSE(SUBSTRING(REVERSE(FullName),
1,
CHARINDEX(' ', REVERSE(FullName)) - 1))
ELSE FullName
END AS LastName
FROM(VALUES('Mary Anne Bloggs'), ('Joe Bloggs'), ('Bloggs')) AS T(FullName);
This version checks that there is a space in the full name to split on. If there isn't then the first name is set to an empty string and the full name is put into the surname. Also, reverse is employed to split on the last space when there is more than one space
I use this query to retrieve first and lastname
SELECT
SUBSTRING(FULLNAME, 1, CASE WHEN CHARINDEX(' ', FULLNAME)>0 THEN CHARINDEX(' ', FULLNAME) - 1 ELSE LEN(FULLNAME) END ) AS Firstname,
REVERSE(SUBSTRING(REVERSE(FULLNAME), 1, CASE WHEN CHARINDEX(' ', REVERSE(FULLNAME))>0 THEN CHARINDEX(' ', REVERSE(FULLNAME)) - 1 ELSE LEN(REVERSE(FULLNAME)) END ) ) AS Firstname
FROM HRMDESFO.EMPLOID
Results
BigQuery: Standard SQL
substr(name,1,STRPOS(name,' ')-1) as FirstName,
substr(name,STRPOS(name,' ')+1,length(name)) as LastName
This is the easiest and shortest to this question without any assumptions. Also you can even further enhance this with a rtrim(ltrim('firstname lastname')).
Just in case of any spaces before the strings,
Select
substring('Firstname Lastname',1,CHARINDEX(' ', 'Firstname Lastname')) as firstname,
substring('Firstname Lastname',CHARINDEX(' ', 'Firstname Lastname'),LEN('Firstname Lastname')) as Lastname
select passemail,substring(passemail,1,instr(passemail,'#') - 1) as name ,
substring(passemail,instr(passemail,'#') + 1,length(passemail)) from passenger
For getting firstName
SELECT SUBSTR(FULLNAME,1,(LOCATE(' ',FULLNAME))) AS FIRSTTNAME from EmployeeDetails;
FOR LASTNAME
SELECT SUBSTR(FULLNAME,(LOCATE(' ',FULLNAME))) AS LASTNAME from EmployeeDetails;
SO
SELECT SUBSTR(FULLNAME,1,(LOCATE(' ',FULLNAME))) AS FIRSTTNAME, SUBSTR(FULLNAME,(LOCATE(' ',FULLNAME))) AS LASTNAME from EmployeeDetails;
SELECT
LEFT(column_name, POSITION(' ' IN column_name)-1) AS first_name,
RIGHT(column_name, LENGTH(column_name) - POSITION(' ' IN column_name)) AS last_name
FROM table_name
SELECT SUBSTRING(candidate_name, 1, CASE WHEN CHARINDEX(' ', candidate_name)>0 THEN CHARINDEX(' ', candidate_name) - 1
ELSE LEN(candidate_name) END ) AS Firstname,
SUBSTRING(substring(candidate_name,CHARINDEX(' ', candidate_name)+1,LEN(candidate_name)), 1,
CASE WHEN CHARINDEX(' ', candidate_name)>1 THEN CHARINDEX(' ', substring(candidate_name,CHARINDEX(' ', candidate_name)+1,LEN(candidate_name)))
ELSE null END ) AS middle_name,
REVERSE(SUBSTRING(REVERSE(candidate_name), 1,
CASE WHEN CHARINDEX(' ', REVERSE(candidate_name))>0
THEN CHARINDEX(' ', REVERSE(candidate_name)) - 1
ELSE null END ) )AS last_name
FROM Test_name
So first we have to find the index for space(" ") because space is the character which is separating the two words ( first_name+" "+last_name).
In my case, its mid_index is a variable that stores the index for space(" ").
SELECT primary_poc, STRPOS(fullname,' ') AS "mid_index"
FROM yourTable_name
*Now we will use min_index to find the left and right sides of words. For this, we can use a subquery.
Below is the final query *
SELECT fullname,
LEFT(fullname, mid_index - 1) AS "first_name",
RIGHT(fullname, LENGTH(primary_poc) - mid_index) AS "last_name"
FROM
(
SELECT primary_poc, STRPOS(fullname,' ') AS "mid_index"
FROM yourTable_name
) AS t1
SELECT
SUBSTR(NAME,1,(LOCATE(NAME, ' '))) AS FIRSTTNAME
, SUBSTR(NAME,(LOCATE(NAME, ' ')+1)) AS LASTNAME
FROM yourTABLE;
FOR SQL SERVER
SELECT
SUBSTRING(fullname, 0, CHARINDEX(' ', fullname)) AS FirstName
,SUBSTRING(fullname, CHARINDEX(' ', fullname), LEN(fullname)) AS LastName
FROM [YourTable]
If your full name has another delimiter aside from space, such as dashes, you substitute in the dash e.g
SELECT
SUBSTRING(fullname, 0, CHARINDEX('-', fullname)) AS FirstName
,SUBSTRING(fullname, CHARINDEX('-', fullname), LEN(fullname)) AS LastName
FROM [YourTable]
In Postgres SQL
SELECT fullname,
SUBSTRING(fullname, 1, POSITION(' ' IN fullname) - 1) as first_name ,
SUBSTRING(fullname,(position(' 'in fullname))) AS lastname from details;
You can use,
STRING_SPLIT (string , separator)

Identifying duplicates within a table: looking for query advice

So I am trying to identify duplicated contact records within an account, and looking for the best way to do this. There is a an account table, and a contact table. Below is the query I've come up with to give me what I need, but I feel like there is probably a better/more efficient way to do this, so looking for any feedback/advice. Thanks in advance!
SELECT * FROM sysdba.CONTACT a WITH(NOLOCK)
WHERE EXISTS
(
SELECT ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL FROM sysdba.CONTACT b WITH(NOLOCK)
GROUP BY ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL
HAVING COUNT(*) > 1
AND a.ACCOUNTID = b.ACCOUNTID AND a.FIRSTNAME = b.FIRSTNAME AND a.LASTNAME = b.LASTNAME AND a.EMAIL = b.EMAIL
)
ORDER BY ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL
Here is another way I can do this, but having to use DISTINCT seems ugly..
SELECT DISTINCT a.CONTACTID, a.FIRSTNAME, a.LASTNAME, a.EMAIL FROM sysdba.CONTACT a WITH(NOLOCK)
JOIN sysdba.CONTACT b WITH(NOLOCK)
ON a.ACCOUNTID = b.ACCOUNTID AND a.FIRSTNAME = b.FIRSTNAME AND a.LASTNAME = b.LASTNAME AND a.EMAIL = b.EMAIL AND a.CONTACTID != b.CONTACTID
ORDER BY a.CONTACTID, a.FIRSTNAME, a.LASTNAME, a.EMAIL
When checking the execution plans for both, the first query is 37% compared to 63% in the second query, which is surprising, as I've always though (apparently wrong) that using joins is quicker than relying on a where clause.
Quite common practice, when you trying to identify duplicates, is to use windowed aggregate functions, such as COUNT() OVER (...) and ROW_NUMBER() OVER (...).
Below is the query that should return you groups of records, where there are more than one CONTACTID for the same ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL combination. In other words this query returns records, having duplicates, along with their duplicates:
;WITH cteCONTACT
AS (
SELECT ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL, CONTACTID,
CNT = COUNT(*) OVER (PARTITION BY ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL)
FROM sysdba.CONTACT
)
SELECT ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL, CONTACTID
FROM cteCONTACT
WHERE CNT > 1;
And the following query should return duplicates only, without records that they duplicates are:
;WITH cteCONTACT
AS (
SELECT ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL, CONTACTID,
NUM = ROW_NUMBER() OVER (
PARTITION BY ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL
ORDER BY CONTACTID)
FROM sysdba.CONTACT
)
SELECT ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL, CONTACTID
FROM cteCONTACT
WHERE NUM > 1;