Postgres - use a dictionary to convert column names? - postgresql

I have a table with columns FIRSTNAME LASTNAME and I want to create a third column that combines those two columns into FIRSTNAME_LASTNAME but ALSO uses a special dictionary to convert some of the names. Say I just want to apply it to the FIRSTNAME, e.g.:
Albert -> Funnyguy, Kathleen -> Nerd, Megan -> Weirdo
So the new column for the "Albert Jones" row would be "Funnyguy_Jones".
Currently I do this in psycopg2 by reading in all the rows (in batches because the db is huge), using a python dictionary to convert and create the new column, then sending out the updates with UPDATE table SET newcol = tmp.newcol FROM (VALUES ...) etc. This is very slow because of reading it into python. Any tips?
EDIT: not all of the names have conversions (only like 10% of them do, for those I want to keep the original name)

If left join has a match COALESCE will choose t2.newName, other wise you will choose t1.firstName
SELECT t1.firstName,
t1.lastName,
COALESCE(t2.newName, t1.firstName) + '_' + t1.lastName as combinedName
FROM firstTable t1
LEFT JOIN newTable t2
ON t1.firstName = t2.firstName

Related

Create rows from part of column names

Source data
I am working on an ELT project to load data from CSV files into PostgreSQL where I will transform it. The CSV files have many columns that are consistent across files, but also contain activity columns that are inconsistent with names like Date (05/19/2020), Type (05/19/2020), etc.
In the loading script I am merging all of the columns with dates in the column name into one jsonb column so I don't have to constantly add new columns to the raw data table.
The resulting jsonb column in the raw data table looks like this:
id
activity
12345678
{"Date (05/19/2020)": null, "Type (05/19/2020)": null, "Date (06/03/2020)": "06/01/2020", "Type (06/03/2020)": "E"}
98765432
{"Date (05/19/2020)": "05/18/2020", "Type (05/19/2020)": "B", "Date (10/23/2020)": "10/26/2020", "Type (10/23/2020)": "T"}
JSON to columns
Using the amazing create_jsonb_flat_view function from this post I can convert the jsonb to columns like this:
id
Date (05/19/2020)
Type (05/19/2020)
Date (06/03/2020)
Type (06/03/2020)
Type (10/23/2020
Date (10/23/2020)
Type (10/23/2020)
10629465
null
null
06/01/2020
E
98765432
05/18/2020
B
10/26/2020
T
Need to move part of column name to row
Now, this is where I'm stuck. I need to remove the portion of the column name that is the Activity Date (e.g. (05/19/2020)) and create a row for each id and ActivityDate with additional columns for Date and Type like this:
id
ActivityDate
Date
Type
12345678
05/19/2020
null
null
12345678
06/03/2020
06/01/2020
E
98765432
05/19/2020
05/18/2020
B
98765432
10/23/2020
10/26/2020
T
I followed your link to the create_jsonb_flat_view article yesterday and then forgot this question. While I thank you for pointing me there, I think that mentioning it worked against you.
A more conventional approach using regexp_replace() works here. I left the date values as strings, but you can convert them with to_date() if needed:
with parse as (
select id, e.k, e.v,
regexp_replace(e.k, '\s+\([0-9/]{10}\)', '') as k_no_date,
regexp_replace(e.k, '^.+([0-9/]{10}).+', '\1') as k_date_only
from rawinput
cross join lateral jsonb_each_text(activity) as e(k, v)
)
select id,
k_date_only as activity_date,
min(v) filter (where k_no_date = 'Date') as date,
min(v) filter (where k_no_date = 'Type') as type
from parse
group by id, k_date_only;
db<>fiddle here
#Mike-Organek's Answer works beautifully!
However, I was curious if the regexp_replace() calls might be slowing the query down a bit and it seemed I could get the same results using a simpler function.
Since Mike gave me a great example to start with I modified it to split on the space between Date and (05/19/2020).
For 20,000 rows, it went from taking an avg of 7 sec on my local machine to an avg of .9 sec.
Here is the resulting query:
with parse as (
select id, e.k, e.v,
split_part(e.k, ' ', 1) as k_no_date,
trim(split_part(e.k, ' ', 2),'()') as k_date_only
from rawinput
cross join lateral jsonb_each_text(activity) as e(k, v)
)
select id,
k_date_only as activity_date,
min(v) filter (where k_no_date = 'Date') as date,
min(v) filter (where k_no_date = 'Type') as type
from parse
group by id, k_date_only;

Fetch rows from postgres table which contains a specific id in jsonb[] column

I have a details table with adeet column defined as jsonb[]
a sample value stored in adeet column is as below image
Sample data stored in DB :
I want to return the rows which satisfies id=26088 i.e row 1 and 3
I have tried array operations and json operations but it does'nt work as required. Any pointers
Obviously the type of the column adeet is not of type JSON/JSONB, but maybe VARCHAR and we should fix the format so as to convert into a JSONB type. I used replace() and r/ltrim() funcitons for this conversion, and preferred to derive an array in order to use jsonb_array_elements() function :
WITH t(jobid,adeet) AS
(
SELECT jobid, replace(replace(replace(adeet,'\',''),'"{','{'),'}"','}')
FROM tab
), t2 AS
(
SELECT jobid, ('['||rtrim(ltrim(adeet,'{'), '}')||']')::jsonb as adeet
FROM t
)
SELECT t.*
FROM t2 t
CROSS JOIN jsonb_array_elements(adeet) j
WHERE (j.value ->> 'id')::int = 26088
Demo
You want to combine JSONB's <# operator with the generic-array ANY construct.
select * from foobar where '{"id":26088}' <# ANY (adeet);

postgresql lef join doesn't works

I would like to add to the table A all the column of the table B, doing a join based on a common column (type numeric). I am trying to do it using the LEFT JOIN but the columns added are all blank. this is impossible because table b stores, among others, the same ID values . Where I am wrong?
Select * from "2017_01" left join "Registry_2017" on '2017_01.ID' = 'Registry_2017.ID';
You are doing wrong.. I don't know why you can use right for Table calling "2017_01" and different with this '2017_01.ID'..
' = Single quote identifies as String
" = Double quote identifies as Table or Column to escape Naming
Select
*
From
"2017_01"
left join "Registry_2017" on '2017_01.ID' = 'Registry_2017.ID';
So when you doing this '2017_01.ID' = 'Registry_2017.ID' The condition will always become false because those 2 different String are not equal. Postgresql look the condition not as Table and Column but String because you are using Single quote
Select
*
from
"2017_01"
left join "Registry_2017" on "2017_01"."ID" = "Registry_2017"."ID";
So the query should be like that.. Even you already got answer and it got work i must tell this..

Temporary Table Value into a Table-Value UDF

I was having some trouble with an SQL 2k sproc and which we moved to SQL 2k5 so we could used Table Value UDF's instead of Scalar UDF's.
This is simplified, but this is my problem.
I have a temporary table that I fill up with product information. I then pass that product information into a UDF and return the information back to my main results set. It doesn't seem to work.
Am I not allowed to pass a Temporary Table value into an CROSS APPLY'd Table Value UDF?
--CREATE AND FILL #brandInfo
SELECT sku, upc, prd_id, cp.customerPrice
FROM products p
JOIN #brandInfo b ON p.brd_id=b.brd_id
CROSS APPLY f_GetCustomerPrice(b.priceAdjustmentValue, b.priceAdjustmentAmount, p.Price) cp
--f_GetCUstomerPrice uses the AdjValue, AdjAmount, and Price to calculate users actual price
When I put dummy values in for b.priceAdjustmentValue and b.priceAdjustmentAmount it works great. But as soon as I try to load the temp table values in it bombs.
Msg 207, Level 16, State 1, Line 140
Invalid column name 'b.priceAdjustmentValue'.
Msg 207, Level 16, State 1, Line 140
Invalid column name 'b.priceAdjustmentAmount'.
Have you tried:
--CREATE AND FILL #brandInfo
SELECT sku, upc, prd_id, cp.customerPrice
FROM products p
JOIN #brandInfo b ON p.brd_id=b.brd_id
CROSS APPLY (
SELECT *
FROM f_GetCustomerPrice(b.priceAdjustmentValue, b.priceAdjustmentAmount, p.Price) cp
)
--f_GetCUstomerPrice uses the AdjValue, AdjAmount, and Price to calculate users actual price
Giving the UDF the proper context in order to resolve the column references?
EDIT:
I have built the following UDF in my local Northwind 2005 database:
CREATE FUNCTION dbo.f_GetCustomerPrice(#adjVal DECIMAL(28,9), #adjAmt DECIMAL(28,9), #price DECIMAL(28,9))
RETURNS TABLE
AS RETURN
(
SELECT Level = 'One', AdjustValue = #adjVal, AdjustAmount = #adjAmt, Price = #price
UNION
SELECT Level = 'Two', AdjustValue = 2 * #adjVal, AdjustAmount = 2 * #adjAmt, Price = 2 * #price
)
GO
And referenced it in the following query without issue:
SELECT p.ProductID,
p.ProductName,
b.CompanyName,
f.Level
FROM Products p
JOIN Suppliers b
ON p.SupplierID = b.SupplierID
CROSS APPLY dbo.f_GetCustomerPrice(p.UnitsInStock, p.ReorderLevel, p.UnitPrice) f
Are you certain that your definition of #brandInfo has the priceAdjustmentValue and priceAdjustmentAmount columns defined on it? More importantly, if you are putting this in a stored procedure as you mentioned, does there exist a #brandInfo table already without those columns defined? I know #brandInfo is a temporary table, but if it exists at the time you attempt to create the stored procedure and it lacks the columns, the parsing engine may be getting tripped up. Oddly, if the table doesn't exist at all, the parsing engine simply glides past the missing table and creates the SP for you.

help with TSQL IN statement with int

I am trying to create the following select statement in a stored proc
#dealerids nvarchar(256)
SELECT *
FROM INVOICES as I
WHERE convert(nvarchar(20), I.DealerID) in (#dealerids)
I.DealerID is an INT in the table. and the Parameter for dealerids would be formatted such as
(8820, 8891, 8834)
When I run this with parameters provided I get no rows back. I know these dealerIDs should provided rows as if I do it individually I get back what I expect.
I think I am doing
WHERE convert(nvarchar(20), I.DealerID) in (#dealerids)
incorrectly. Can anyone point out what I am doing wrong here?
Use a table values parameter (new in SQl Server 2008). Set it up by creating the actual table parameter type:
CREATE TYPE IntTableType AS TABLE (ID INTEGER PRIMARY KEY)
Your procedure would then be:
Create Procedure up_TEST
#Ids IntTableType READONLY
AS
SELECT *
FROM ATable a
WHERE a.Id IN (SELECT ID FROM #Ids)
RETURN 0
GO
if you can't use table value parameters, see: "Arrays and Lists in SQL Server 2005 and Beyond, When Table Value Parameters Do Not Cut it" by Erland Sommarskog, then there are many ways to split string in SQL Server. This article covers the PROs and CONs of just about every method. in general, you need to create a split function. This is how a split function can be used:
SELECT
*
FROM YourTable y
INNER JOIN dbo.yourSplitFunction(#Parameter) s ON y.ID=s.Value
I prefer the number table approach to split a string in TSQL but there are numerous ways to split strings in SQL Server, see the previous link, which explains the PROs and CONs of each.
For the Numbers Table method to work, you need to do this one time table setup, which will create a table Numbers that contains rows from 1 to 10,000:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this split function:
CREATE FUNCTION [dbo].[FN_ListToTable]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this will not return empty rows
----------------
SELECT
ListValue
FROM (SELECT
LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(#SplitOn, List2, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS List2
) AS dt
INNER JOIN Numbers n ON n.Number < LEN(dt.List2)
WHERE SUBSTRING(List2, number, 1) = #SplitOn
) dt2
WHERE ListValue IS NOT NULL AND ListValue!=''
);
GO
You can now easily split a CSV string into a table and join on it:
Create Procedure up_TEST
#Ids VARCHAR(MAX)
AS
SELECT * FROM ATable a
WHERE a.Id IN (SELECT ListValue FROM dbo.FN_ListToTable(',',#Ids))
You can't use #dealerids like that, you need to use dynamic SQL, like this:
#dealerids nvarchar(256)
EXEC('SELECT *
FROM INVOICES as I
WHERE convert(nvarchar(20), I.DealerID) in (' + #dealerids + ')'
The downside is that you open yourself up to SQL injection attacks unless you specifically control the data going into #dealerids.
There are better ways to handle this depending on your version of SQL Server, which are documented in this great article.
Split #dealerids into a table then JOIN
SELECT *
FROM INVOICES as I
JOIN
ufnSplit(#dealerids) S ON I.DealerID = S.ParsedIntDealerID
Assorted split functions here (I'd probably a numbers table in this case for a small string