Cast a PostgreSQL column to stored type - postgresql

I am creating a viewer for PostgreSQL. My SQL needs to sort on the type that is normal for that column. Take for example:
Table:
CREATE TABLE contacts (id serial primary key, name varchar)
SQL:
SELECT id::text FROM contacts ORDER BY id;
Gives:
1
10
100
2
Ok, so I change the SQL to:
SELECT id::text FROM contacts ORDER BY id::regtype;
Which reults in:
1
2
10
100
Nice! But now I try:
SELECT name::text FROM contacts ORDER BY name::regtype;
Which results in:
invalid type name "my first string"
Google is no help. Any ideas? Thanks
Repeat: the error is not my problem. My problem is that I need to convert each column to text, but order by the normal type for that column.

regtype is a object identifier type and there is no reason to use it when you are not referring to system objects (types in this case).
You should cast the column to integer in the first query:
SELECT id::text
FROM contacts
ORDER BY id::integer;
You can use qualified column names in the order by clause. This will work with any sortable type of column.
SELECT id::text
FROM contacts
ORDER BY contacts.id;

So, I found two ways to accomplish this. The first is the solution #klin provided by querying the table and then constructing my own query based on the data. An untested psycopg2 example:
c = conn.cursor()
c.execute("SELECT * FROM contacts LIMIT 1")
select_sql = "SELECT "
for row in c.description:
if row.name == "my_sort_column":
if row.type_code == 23:
sort_by_sql = row.name + "::integer "
else:
sort_by_sql = row.name + "::text "
c.execute("SELECT * FROM contacts " + sort_by_sql)
A more elegant way would be like this:
SELECT id::text AS _id, name::text AS _name AS n FROM contacts ORDER BY id
This uses aliases so that ORDER BY still picks up the original data. The last option is more readable if nothing else.

Related

Use postgresql query results to form another query

I am trying to select from one table using the select result from another table. I can run this in two queries but would like to optimize it into just one.
First query.. Select ids where matching other id
select id from lookuptable where paid = '547'
This results in something like this
6316352
6316353
6318409
6318410
6320468
6320469
6320470
6322526
6322527
6324586
6324587
6326648
I would like to then use this result to make another selection. I can do it manually like below. Note, there could be many rows with these values so I've been using a IN statement
select * from "othertable" where id in (6316352,6316353,6318409,6318410,6320468,6320469,6320470,6322526,6322527,6324586,6324587,6326648);
select
ot.*
from
"othertable" as ot
join
lookuptable as lt
on
ot.id = lt.id
where
lt.paid = '547'
The IN operator supports not just value lists but also subqueries, so you can literally write
select * from "othertable" where id in (select id from lookuptable where paid = '547');

Create rows from part of column names

Source data
I am working on an ELT project to load data from CSV files into PostgreSQL where I will transform it. The CSV files have many columns that are consistent across files, but also contain activity columns that are inconsistent with names like Date (05/19/2020), Type (05/19/2020), etc.
In the loading script I am merging all of the columns with dates in the column name into one jsonb column so I don't have to constantly add new columns to the raw data table.
The resulting jsonb column in the raw data table looks like this:
id
activity
12345678
{"Date (05/19/2020)": null, "Type (05/19/2020)": null, "Date (06/03/2020)": "06/01/2020", "Type (06/03/2020)": "E"}
98765432
{"Date (05/19/2020)": "05/18/2020", "Type (05/19/2020)": "B", "Date (10/23/2020)": "10/26/2020", "Type (10/23/2020)": "T"}
JSON to columns
Using the amazing create_jsonb_flat_view function from this post I can convert the jsonb to columns like this:
id
Date (05/19/2020)
Type (05/19/2020)
Date (06/03/2020)
Type (06/03/2020)
Type (10/23/2020
Date (10/23/2020)
Type (10/23/2020)
10629465
null
null
06/01/2020
E
98765432
05/18/2020
B
10/26/2020
T
Need to move part of column name to row
Now, this is where I'm stuck. I need to remove the portion of the column name that is the Activity Date (e.g. (05/19/2020)) and create a row for each id and ActivityDate with additional columns for Date and Type like this:
id
ActivityDate
Date
Type
12345678
05/19/2020
null
null
12345678
06/03/2020
06/01/2020
E
98765432
05/19/2020
05/18/2020
B
98765432
10/23/2020
10/26/2020
T
I followed your link to the create_jsonb_flat_view article yesterday and then forgot this question. While I thank you for pointing me there, I think that mentioning it worked against you.
A more conventional approach using regexp_replace() works here. I left the date values as strings, but you can convert them with to_date() if needed:
with parse as (
select id, e.k, e.v,
regexp_replace(e.k, '\s+\([0-9/]{10}\)', '') as k_no_date,
regexp_replace(e.k, '^.+([0-9/]{10}).+', '\1') as k_date_only
from rawinput
cross join lateral jsonb_each_text(activity) as e(k, v)
)
select id,
k_date_only as activity_date,
min(v) filter (where k_no_date = 'Date') as date,
min(v) filter (where k_no_date = 'Type') as type
from parse
group by id, k_date_only;
db<>fiddle here
#Mike-Organek's Answer works beautifully!
However, I was curious if the regexp_replace() calls might be slowing the query down a bit and it seemed I could get the same results using a simpler function.
Since Mike gave me a great example to start with I modified it to split on the space between Date and (05/19/2020).
For 20,000 rows, it went from taking an avg of 7 sec on my local machine to an avg of .9 sec.
Here is the resulting query:
with parse as (
select id, e.k, e.v,
split_part(e.k, ' ', 1) as k_no_date,
trim(split_part(e.k, ' ', 2),'()') as k_date_only
from rawinput
cross join lateral jsonb_each_text(activity) as e(k, v)
)
select id,
k_date_only as activity_date,
min(v) filter (where k_no_date = 'Date') as date,
min(v) filter (where k_no_date = 'Type') as type
from parse
group by id, k_date_only;

How to work with data values formatted [{}, {}, {}]

I apologize if this is a simple question - I had some trouble even formatting the question when I was trying to Google for help!
In one of the tables I am working with, there's data value that looks like below:
Invoice ID
Status
Product List
1234
Processed
[{"product_id":463153},{"product_id":463165},{"product_id":463177},{"pid":463218}]
I want to count how many products each order has purchased. What is the proper syntax and way to count the values under "Product List" column? I'm aware that count() is wrong, and I need to maybe extract the data from the string value.
select invoice_id, count(Product_list)
from quote_table
where status = 'processed'
group by invoice_id
You can use a JSON function named: json_array_length and cast this column like a JSON data type (as long as possible), for example:
select invoice_id, json_array_length(Product_list::json) as count
from quote_table
where status = 'processed'
group by invoice_id;
invoice_id | count
------------+-------
1234 | 4
(1 row)
If you need to count a specific property of the json column, you can use the query below.
This query solves the problem by using create type, json_populate_recordset and subquery to count product_id inside json data.
drop type if exists count_product;
create type count_product as (product_id int);
select
t.invoice_id,
t.status,
(
select count(*) from json_populate_recordset(
null::count_product,
t.Product_list
)
where product_id is not null
) as count_produto_id
from (
-- symbolic data to use in query
select
1234 as invoice_id,
'processed' as status,
'[{"product_id":463153},{"product_id":463165},{"product_id":463177},{"pid":463218}]'::json as Product_list
) as t

Update a column with row_number with duplicate records without PK

I have a table with duplicate records but without primary key. Data looks like this:
I want to update one empty column with one of column concatenate with row_number. After update, I want to achieve this:
Since the table does not have a unique column, which means I would join back to a CTE or subquery. I know in sql server, it can be done like this:
UPDATE X
SET X.NEW_KEY = X.PERSONNUMBER + '-' + X.NEW_CODE_DEST
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY PERSONNUMBER) AS NEW_CODE_DEST,PERSONNUMBER,NEW_KEY
FROM EMPLOYEE
) as X;
I tried same logic in postgresql but it didn't work. It threw an error:
SQL Error [42P01]: ERROR: relation "x" does not exist
I also tried this in
UPDATE EMPLOYEE
SET NEW_KEY = X.PERSONNUMBER || '-' || X.NEW_CODE_DEST
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY PERSONNUMBER) AS NEW_CODE_DEST,PERSONNUMBER
FROM EMPLOYEE
) as X;
The NEW_KEY is updated with duplicate value, not surprise there.
So is there a equivalent method in to achieve the same update result? Or I have my query wrong?
Really appreciate the help!

Postgres - use a dictionary to convert column names?

I have a table with columns FIRSTNAME LASTNAME and I want to create a third column that combines those two columns into FIRSTNAME_LASTNAME but ALSO uses a special dictionary to convert some of the names. Say I just want to apply it to the FIRSTNAME, e.g.:
Albert -> Funnyguy, Kathleen -> Nerd, Megan -> Weirdo
So the new column for the "Albert Jones" row would be "Funnyguy_Jones".
Currently I do this in psycopg2 by reading in all the rows (in batches because the db is huge), using a python dictionary to convert and create the new column, then sending out the updates with UPDATE table SET newcol = tmp.newcol FROM (VALUES ...) etc. This is very slow because of reading it into python. Any tips?
EDIT: not all of the names have conversions (only like 10% of them do, for those I want to keep the original name)
If left join has a match COALESCE will choose t2.newName, other wise you will choose t1.firstName
SELECT t1.firstName,
t1.lastName,
COALESCE(t2.newName, t1.firstName) + '_' + t1.lastName as combinedName
FROM firstTable t1
LEFT JOIN newTable t2
ON t1.firstName = t2.firstName