Can the categories in the postgres tablefunc crosstab() function be integers? - postgresql

It's all in the title. Documentation has something like this:
SELECT *
FROM crosstab('...') AS ct(row_name text, category_1 text, category_2 text);
I have two tables, lab_tests and lab_tests_results. All of the lab_tests_results rows are tied to the primary key id integer in the lab_tests table. I'm trying to make a pivot table where the lab tests (identified by an integer) are row headers and the respective results are in the table. I can't get around a syntax error at or around the integer.
Is this possible with the current set up? Am I missing something in the documentation? Or do I need to perform an inner join of sorts to make the categories strings? Or modify the lab_tests_results table to use a text identifier for the lab tests?
Thanks for the help, all. Much appreciated.
Edit: Got it figured out with the help of Dmitry. He had the data layout figured out, but I was unclear on what kind of output I needed. I was trying to get the pivot table to be based on batch_id numbers in the lab_tests_results table. Had to hammer out the base query and casting data types.
SELECT *
FROM crosstab('SELECT lab_tests_results.batch_id, lab_tests.test_name, lab_tests_results.test_result::FLOAT
FROM lab_tests_results, lab_tests
WHERE lab_tests.id=lab_tests_results.lab_test AND (lab_tests.test_name LIKE ''Test Name 1'' OR lab_tests.test_name LIKE ''Test Name 2'')
ORDER BY 1,2'
) AS final_result(batch_id VARCHAR, test_name_1 FLOAT, test_name_2 FLOAT);
This provides a pivot table from the lab_tests_results table like below:
batch_id |test_name_1 |test_name_2
---------------------------------------
batch1 | result1 | <null>
batch2 | result2 | result3

If I understand correctly your tables look something like this:
CREATE TABLE lab_tests (
id INTEGER PRIMARY KEY,
name VARCHAR(500)
);
CREATE TABLE lab_tests_results (
id INTEGER PRIMARY KEY,
lab_tests_id INTEGER REFERENCES lab_tests (id),
result TEXT
);
And your data looks something like this:
INSERT INTO lab_tests (id, name)
VALUES (1, 'test1'),
(2, 'test2');
INSERT INTO lab_tests_results (id, lab_tests_id, result)
VALUES (1,1,'result1'),
(2,1,'result2'),
(3,2,'result3'),
(4,2,'result4'),
(5,2,'result5');
First of all crosstab is part of tablefunc, you need to enable it:
CREATE EXTENSION tablefunc;
You need to run it one per database as per this answer.
The final query will look like this:
SELECT *
FROM crosstab(
'SELECT lt.name::TEXT, lt.id, ltr.result
FROM lab_tests AS lt
JOIN lab_tests_results ltr ON ltr.lab_tests_id = lt.id'
) AS ct(test_name text, result_1 text, result_2 text, result_3 text);
Explanation:
The crosstab() function takes a text of a query which should return 3 columns; (1) a column for name of a group, (2) a column for grouping, (3) the value. The wrapping query just selects all the values those crosstab() returns and defines the list of columns after (the part after AS). First is the category name (test_name) and then the values (result_1, result_2). In my query I'll get up to 3 results. If I have more then 3 results then I won't see them, If I have less then 3 results I'll get nulls.
The result for this query is:
test_name |result_1 |result_2 |result_3
---------------------------------------
test1 |result1 |result2 |<null>
test2 |result3 |result4 |result5

Related

Split the column value and make key as column name in postgres query

I have the table with the column value as below:
data_as_of_date:20210202 unique_cc:3999
data_as_of_date:20220202 unique_cc:1999
i need to convert this column into like this:
data_as_of_date unique_cc
20210202 3999
20220202 1999
Sample data:
create table test (val varchar);
insert into test(val) values ('data_as_of_date:20210202 unique_cc:3999');
insert into test(val) values ('data_as_of_date:20220202 unique_cc:1999');
I have tried with unnest with string_to_array & crosstab functions, but it is not working.
You don't need unnest or a crosstab for this. A simple regular expression should do the trick:
select substring(the_column from 'data_as_of_date:([0-9]{8})') as data_as_of_date,
substring(the_column from 'unique_cc:([0-9]{4})') as unqiue_cc
from the_table;

Extract all the values in jsonb into a row

I'm using postgresql 11, I have a jsonb which represent a row of that table, it's look like
{"userid":"test","rolename":"Root","loginerror":0,"email":"superadmin#ae.com",...,"thirdpartyauthenticationkey":{}}
is there any method that I could gather all the "values" of the jsonb into a string which is separated by ',' and without the keys?
The string I want to obtain with the jsonb above is like
(test, Root, 0, superadmin#ae.com, ..., {})
I need to keep the ORDER of those values as what their keys were in the jsonb. Could I do that with postgresql?
You can use the jsonb_populate_record function (assuming your json data does match the users table). This will force the text value to match the order of your users table:
Schema (PostgreSQL v13)
CREATE TABLE users (
userid text,
rolename text,
loginerror int,
email text,
thirdpartyauthenticationkey json
)
Query #1
WITH d(js) AS (
VALUES
('{"userid":"test", "rolename":"Root", "loginerror":0, "email":"superadmin#ae.com", "thirdpartyauthenticationkey":{}}'::jsonb),
('{"userid":"other", "rolename":"User", "loginerror":324, "email":"nope#ae.com", "thirdpartyauthenticationkey":{}}'::jsonb)
)
SELECT jsonb_populate_record(null::users, js),
jsonb_populate_record(null::users, js)::text AS record_as_text,
pg_typeof(jsonb_populate_record(null::users, js)::text)
FROM d
;
jsonb_populate_record
record_as_text
pg_typeof
(test,Root,0,superadmin#ae.com,{})
(test,Root,0,superadmin#ae.com,{})
text
(other,User,324,nope#ae.com,{})
(other,User,324,nope#ae.com,{})
text
Note that if you're building this string to insert it back into postgresql then you don't need to do that, since the result of jsonb_populate_record will match your table:
Query #2
WITH d(js) AS (
VALUES
('{"userid":"test", "rolename":"Root", "loginerror":0, "email":"superadmin#ae.com", "thirdpartyauthenticationkey":{}}'::jsonb),
('{"userid":"other", "rolename":"User", "loginerror":324, "email":"nope#ae.com", "thirdpartyauthenticationkey":{}}'::jsonb)
)
INSERT INTO users
SELECT (jsonb_populate_record(null::users, js)).*
FROM d;
There are no results to be displayed.
Query #3
SELECT * FROM users;
userid
rolename
loginerror
email
thirdpartyauthenticationkey
test
Root
0
superadmin#ae.com
[object Object]
other
User
324
nope#ae.com
[object Object]
View on DB Fiddle
You can use jsonb_each_text() to get a set of a text representation of the elements, string_agg() to aggregate them in a comma separated string and concat() to put that in parenthesis.
SELECT concat('(', string_agg(value, ', '), ')')
FROM jsonb_each_text('{"userid":"test","rolename":"Root","loginerror":0,"email":"superadmin#ae.com","thirdpartyauthenticationkey":{}}'::jsonb) jet (key,
value);
db<>fiddle
You didn't provide DDL and DML of a (the) table the JSON may reside in (if it does, that isn't clear from your question). The demonstration above therefore only uses the JSON you showed as a scalar. If you have indeed a table you need to CROSS JOIN LATERAL and GROUP BY some key.
Edit:
If you need to be sure the order is retained and you don't have that defined in a table's structure as #Marth's answer assumes, then you can of course extract every value manually in the order you need them.
SELECT concat('(',
concat_ws(', ',
j->>'userid',
j->>'rolename',
j->>'loginerror',
j->>'email',
j->>'thirdpartyauthenticationkey'),
')')
FROM (VALUES ('{"userid":"test","rolename":"Root","loginerror":0,"email":"superadmin#ae.com","thirdpartyauthenticationkey":{}}'::jsonb)) v (j);
db<>fiddle

How to break apart a column includes keys and values into separate columns in postgres

I am new in postgres and basically have no experience. I have a table with a column includes key and value. I need write a query which return a table with the all the columns of the table and additional columns as key as the column name and the value under it.
My input is like:
id | name|message
12478| A |{img_type:=png,key_id:=f235, client_status:=active, request_status:=open}
12598| B |{img_type:=none,address_id:=c156, client_status:=active, request_status:=closed}
output will be:
id |name| Img_type|Key_id|address_id|Client_status|Request_status
12478| A | png |f235 |NULL |active | open
12598| B | none |NULL |c156 |active | closed
Any help would be greatly appreciated.
The only thing I can think of, is a regular expression to extract the key/value pairs.
select id, name,
(regexp_match(message, '(img_type:=)([^,}]+),{0,1}'))[2] as img_type,
(regexp_match(message, '(key_id:=)([^,}]+),{0,1}'))[2] as key_id,
(regexp_match(message, '(client_status:=)([^,}]+),{0,1}'))[2] as client_status,
(regexp_match(message, '(request_status:=)([^,}]+),{0,1}'))[2] as request_status
from the_table;
regexp_match returns an array of matches. As the regex contains two groups (one for the "key" and one for the "value"), the [2] takes the second element of the array.
This is quite expensive and error prone (e.g. if any of the values contains a , and you need to deal with quoted values). If you have any chance to change the application that stores the value, you should seriously consider changing your code to store a proper JSON value, e.g.
{"img_type": "png", "key_id": "f235", "client_status": "active", "request_status": "open"}'
then you can use e.g. message ->> 'img_type' to retrieve the value for the key img_type
You might also want to consider a properly normalized table, where each of those keys is a real column.
I can do it with function.
I am sure about the performance but here is my suggestion:
CREATE TYPE log_type AS (img_type TEXT, key_id TEXT, address_id TEXT, client_status TEXT, request_status TEXT);
CREATE OR REPLACE FUNCTION populate_log(data TEXT)
RETURNS log_type AS
$func$
DECLARE
r log_type;
BEGIN
select x.* into r
from
(
select
json_object(array_agg(array_data)) as json_data
from (
select unnest(string_to_array(trim(unnest(string_to_array(substring(populate_log.data, '[^{}]+'), ','))), ':=')) as array_data
) d
) d2,
lateral json_to_record(json_data) as x(img_type text, key_id text, address_id text, client_status text, request_status text);
RETURN r;
END
$func$ LANGUAGE plpgsql;
with log_data (id, name, message) as (
values
(12478, 'A', '{img_type:=png,key_id:=f235, client_status:=active, request_status:=open}'),
(12598, 'B', '{img_type:=none,address_id:=c156, client_status:=active, request_status:=closed}')
)
select id, name, l.*
from log_data, lateral populate_log(message) as l;
What you finally write in query will be something like this, imagine that the data is in a table named log_data :
select id, name, l.*
from log_data, lateral populate_log(message) as l;
I suppose that message column is a text, in Postgres it might be an array, in that case you have to remove some conversions, string_to_array(substring(populate_log.data)) -> populate_log.data

Postgresql inserts values falsely

I want to add a denormalized table for some data of a gtfs-feed. For that I created a new table:
CREATE TABLE denormalized_trips (
stops_coords json NOT NULL,
stops_object json NOT NULL,
agency_key text NOT NULL,
trip_id text NOT NULL,
route_id text NOT NULL,
service_id text NOT NULL,
shape_id text,
route_color text,
route_long_name text,
route_desc text,
direction_id text
);
CREATE INDEX denormalized_trips_index ON denormalized_trips (agency_key, trip_id);
CREATE UNIQUE INDEX denormalized_trips_index ON denormalized_trips (agency_key, route_id);
Now I want to transfer data from one table to the other via an insert statement. The statement is rather complex.
INSERT INTO denormalized_trips
SELECT
trps.stops_coords,
trps.stops_object,
trps.trip_id,
trps.service_id,
trps.route_id,
trps.direction_id,
trps.agency_key,
trps.shape_id,
trps.route_color,
trps.route_long_name,
trps.route_desc
FROM (
SELECT
array_to_json(ARRAY_AGG(array[stop_lat, stop_lon])) AS stops_coords,
array_to_json(ARRAY_AGG(array[
stops.stop_id,
CAST ( stop_times.stop_sequence AS TEXT ),
stops.stop_name,
stop_times.departure_time,
CAST ( stop_times.departure_time_seconds AS TEXT ),
stop_times.arrival_time,
CAST ( stop_times.arrival_time_seconds AS TEXT )
])) AS stops_object,
trips.trip_id,
trips.service_id,
trips.direction_id,
trips.agency_key,
trips.shape_id,
routes.route_id,
routes.route_color,
routes.route_long_name,
routes.route_desc
FROM gtfs_stop_times AS stop_times
INNER JOIN gtfs_trips AS trips
ON trips.trip_id = stop_times.trip_id AND trips.agency_key = stop_times.agency_key
INNER JOIN gtfs_routes AS routes ON trips.agency_key = routes.agency_key AND routes.route_id = trips.route_id
INNER JOIN gtfs_stops AS stops
ON stops.stop_id = stop_times.stop_id
AND stops.agency_key = stop_times.agency_key
AND NOT EXISTS (
SELECT 0
FROM denormalized_max_stop_sequence AS max
WHERE max.agency_key = stop_times.agency_key
AND max.trip_id = stop_times.trip_id
AND max.trip_max = stop_times.stop_sequence
)
GROUP BY
trips.trip_id,
trips.service_id,
trips.direction_id,
trips.agency_key,
trips.shape_id,
routes.route_id,
routes.route_color,
routes.route_long_name,
routes.route_desc
) as trps
If I just run the inner select statement I will get the right results. They look something like this: (screenshot does not show all tables because it's too long)
But if I execute the insert statement and display the content of the table i will get something like this:
As you may notice the contents are not inserted into the right columns of the table. The agency_key now has the values of the trip_id and the direction_id is now the service_id (and there are more tables that are messed up).
So my question is what am I doing wrong that my insert statement inserts the contents into the wrong columns of the newly created table?
Thanks for your help.
Postgres, by default, will insert your values in the order the columns are declared in the table; it has nothing to do with what your columns are named in the query.
https://www.postgresql.org/docs/9.5/static/sql-insert.html
If no list of column names is given at all, the default is all the columns of the table in their declared order; or the first N column names, if there are only N columns supplied by the VALUES clause or query.
You can alter your insert to declare the order of the columns you're inserting, or you can change the order of your select to match the order of columns in the table.

Postgresql Select all columns and column names with a specific value for a row

I have a table with many(+1000) columns and rows(~1M). The columns have either the value 1 , or are NULL.
I want to be able to select, for a specific row (user) retrieve the column names that have a value of 1.
Since there are many columns on the table, specifying the columns would yield a extremely long query.
You're doing something SQL is quite bad at - dynamic access to columns, or treating a row as a set. It'd be nice if this were easier, but it doesn't work well with SQL's typed nature and the concept of a relation. Working with your data set in its current form is going to be frustrating; consider storing an array, json, or hstore of values instead.
Actually, for this particular data model, you could probably use a bitfield. See bit(n) and bit varying(n).
It's still possible to make a working query with your current model PostgreSQL extensions though.
Given sample:
CREATE TABLE blah (id serial primary key, a integer, b integer, c integer);
INSERT INTO blah(a,b,c) VALUES (NULL, NULL, 1), (1, NULL, 1), (NULL, NULL, NULL), (1, 1, 1);
I would unpivot each row into a key/value set using hstore (or in newer PostgreSQL versions, the json functions). SQL its self provides no way to dynamically access columns, so we have to use an extension. So:
SELECT id, hs FROM blah, LATERAL hstore(blah) hs;
then extract the hstores to sets:
SELECT id, k, v FROM blah, LATERAL each(hstore(blah)) kv(k,v);
... at which point your can filter for values matching the criteria. Note that all columns have been converted to text, so you may want to cast it back:
SELECT id, k FROM blah, LATERAL each(hstore(blah)) kv(k,v) WHERE v::integer = 1;
You also need to exclude id from matching, so:
regress=> SELECT id, k FROM blah, LATERAL each(hstore(blah)) kv(k,v) WHERE v::integer = 1 AND
k <> 'id';
id | k
----+---
1 | c
2 | a
2 | c
4 | a
4 | b
4 | c
(6 rows)