PostgreSQL Crosstab issues / "Return and SQL tuple descriptions are incompatible" - postgresql

Good afternoon, I am using POSTGRESql version 9.2 and I'm trying to use a crosstab function to transpose two columns on a table so that i can later join it to a different SELECT query.
I have installed the tablefunc extension.
However i keep getting this "Return and SQL tuple descriptions are incompatible" error which seems to be because of typecasts.
I don't need them to be a specific type.
My original SELECT query is this
SELECT inventoryid, ttype, tamount
FROM inventorytesting
Which gives me the following result:
inventoryid ttype tamount
2451530088940460 7 0.2
2451530088940460 2 0.5
2451530088940460 8 0.1
2451530088940460 1 15.7
8751530077940461 7 0.7
8751530077940461 2 0.2
8751530077940461 8 1.1
8751530077940461 1 19.2
and my goal is to get it like:
inventoryid 7 2 8 1
8751530077940461 0.7 0.2 1.1 19.2
2451530088940460 0.2 0.5 0.1 15.7
The 'ttype' field has 49 different values such as "7","2","8","1" which are fixed.
The 'tamount' field varies its values depending on the 'inventoryid' field but there will always be 49 of them, even if its value is zero. It will never be "null".
I have tried a few variations that i could find in the internet which sum up to this:
SELECT *
FROM crosstab (
$$SELECT inventoryid, ttype, tamount
FROM inventorytesting
WHERE inventoryid = '2451530088940460'
ORDER BY inventoryid, ttype$$
)
AS ct("inventoryid" text,"ttype" smallint,"tamount" numeric)
The fieldtypes on the inventorytesting table are
select column_name, data_type from information_schema.columns
where table_name = 'inventorytesting'
Results:
column_name data_type
id bigint
ttype smallint
tamount numeric
tunit text
tlessthan smallint
plantid text
sessiontime bigint
deleted smallint
inventoryid text
docdata text
docname text
labid bigint
Any pointers would be great.

demo:db<>fiddle
The resulting table definition has to contain the table structure you are expecting - the pivoted one - and not the structure of the given one:
SELECT *
FROM crosstab(
$$SELECT inventoryid, ttype, tamount
FROM inventorytesting
WHERE inventoryid = '2451530088940460'
ORDER BY inventoryid, ttype$$
)
AS ct("inventoryid" text,"type1" numeric,"type2" numeric,"type7" numeric,"type8" numeric)
Addionally there is no need to use the crosstab function. You can achieve a pivot by simply using the standard CASE function:
SELECT
inventoryid,
SUM(CASE WHEN ttype = 1 THEN tamount END) AS type1,
SUM(CASE WHEN ttype = 2 THEN tamount END) AS type2,
SUM(CASE WHEN ttype = 7 THEN tamount END) AS type7,
SUM(CASE WHEN ttype = 8 THEN tamount END) AS type8
FROM
inventorytesting
GROUP BY 1
If you were on 9.4 or higher you could use the Postgres specific FILTER clause:
SELECT
inventoryid,
SUM(tamount) FILTER (WHERE ttype = 1) AS type1,
SUM(tamount) FILTER (WHERE ttype = 2) AS type2,
SUM(tamount) FILTER (WHERE ttype = 7) AS type7,
SUM(tamount) FILTER (WHERE ttype = 8) AS type8
FROM
inventorytesting
GROUP BY 1
demo:db<>fiddle

With the crosstab, you define the actual result table (basically the result of the pivot). The input query defines three columns which are then processed as:
grouping column result in the actual rows
the pivot columns
value for the pivot column
In your case, the crosstab therefore needs to be defined as:
ct(
"inventoryid" text,
"tamount_1" numeric,
"tamount_2" numeric,
"tamount_3" numeric,
...
)
The column header will then correlate to a certain value of column ttype in the order as defined by the inner query's ORDER BY.
The thing with crosstab is that missing values for ttype (e.g. some value returned for 4 but not for 3), the resulting columns would be 1, 2, 4, ... with 3 being missing. Here, you'd have to make sure (if you need consistent output) that your inner query returns at least a NULL row (e.g. via LEFT JOIN).

Related

How to find nearest entries before and after a value in Postresql

Similar to this question: How to find the first nearest value up and down in SQL?
I have a Postgres DB Table named prices structured as:
id
column_1
column_2
column_3
date_col
1
1.5
1.7
1.6
1234560000
2
0.9
1.1
1.0
1234570000
3
11.5
23.5
17.5
1234580000
4
8.3
12.3
10.3
1234600000
I'm trying to select either the row that matches exactly to an input date:
Example #1: input query for date_col = 1234580000 would return...
id
column_1
column_2
column_3
date_col
1
1.5
1.7
1.6
1234580000
or if that date does not exist, then retrieve the entries immediately before and after:
Example #2: input query for date_col = 1234590000 would return...
id
column_1
column_2
column_3
date_col
3
11.5
23.5
17.5
1234580000
4
8.3
12.3
10.3
1234600000
I attempted to mess around with the code from the similar question, but I am a bit stuck and have resorted to trying to get the original query date -> check if the DB returned anything in Python, then I create a broad range for the DB to return, send the second query, and then iterate over the returned result in Python. If the result still doesn't exist then I make the range larger... which I know is the wrong way, but my brain is too smooth for this haha
Current code that works when the entry does exist, but does not work when the entry does not exist:
SELECT *
FROM prices, (SELECT id, next_val, last_val
FROM (SELECT t.*,
LEAD(t.id, 1) OVER (ORDER BY t.date_col) as next_val,
LAG(t.id, 1) OVER (ORDER BY t.date_col) as last_val
FROM prices AS t) AS s
WHERE 1234580000 IN (s.date_col, s.next_val, s.last_val)) AS x
WHERE prices.id = x.id OR prices.id = x.next_val OR prices.id = x.last_val
based on the accepted answer this worked like a charm:
SELECT * FROM (SELECT * FROM prices WHERE prices.date_col <= 1234580000 ORDER BY prices.date_col DESC LIMIT 1) AS a UNION (SELECT * FROM prices WHERE prices.date_col >= 1234580000 ORDER BY prices.date_col ASC LIMIT 1)
I guess the simplest solution would be a UNION:
SELECT *
FROM prices
WHERE date_col <= 1234580000
ORDER BY date_col DESC
LIMIT 1
UNION
SELECT *
FROM prices
WHERE date_col >= 1234580000
ORDER BY date_col ASC
LIMIT 1

Get the next (or previous) non-null value in multiple partitioned

Sample data below.
I want to clean up data based on the next non-null value of the same id, based on row (actually a timestamp).
I can't do lag, because in some cases there are consecutive nulls.
I can't do coalesce(a.col_a, (select min(b.col_a) from table b where a.id=b.id)) because it will return an "outdated" value (eg NYC instead of SF in col_a row 4). (I can do this, once I've accounted for everything else, for the cases where i have no next non-null value, like col_b row 9/10, to just fill in the last).
The only thing I can think of is to do
table_x as (select id, col_x from table where col_a is not null)
for each column, and then join taking the minimum where id = id and table_x.row > table.row. But I have a handful of columns and that feels cumbersome and inefficient.
Appreciate any help!
row
id
col_a
col_a_desired
col_b
col_b_desired
0
1
-
NYC
red
red
1
1
NYC
NYC
red
red
2
1
SF
SF
-
blue
3
1
-
SF
-
blue
4
1
SF
SF
blue
blue
5
2
PAR
PAR
red
red
6
2
LON
LON
-
blue
7
2
LON
LON
-
blue
8
2
-
LON
blue
blue
9
2
LON
LON
-
blue
10
2
-
LON
-
blue
Can you try this query?
WITH samp AS (
SELECT 0 row_id, 1 id, null col_a, 'red' col_b UNION ALL
SELECT 1, 1, 'NYC', 'red' UNION ALL
SELECT 2, 1, 'SF', NULL UNION ALL
SELECT 3, 1, NULL, NULL UNION ALL
SELECT 4, 1, 'SF', 'blue' UNION ALL
SELECT 5, 2, 'PAR', 'red' UNION ALL
SELECT 6, 2, 'LON', NULL UNION ALL
SELECT 7, 2, 'LON', NULL UNION ALL
SELECT 8, 2, NULL, 'blue' UNION ALL
SELECT 9, 2, 'LON', NULL UNION ALL
SELECT 10, 2, NULL, NULL
)
SELECT
row_id,
id,
IFNULL(FIRST_VALUE(col_a IGNORE NULLS)
OVER (PARTITION BY id ORDER BY row_id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING),
FIRST_VALUE(col_a IGNORE NULLS)
OVER (PARTITION BY id ORDER BY row_id desc
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)) AS col_a,
IFNULL(FIRST_VALUE(col_b IGNORE NULLS)
OVER (PARTITION BY id ORDER BY row_id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING),
FIRST_VALUE(col_b IGNORE NULLS)
OVER (PARTITION BY id ORDER BY row_id desc
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)) AS col_b
from samp order by id, row_id
Output:
References:
https://cloud.google.com/bigquery/docs/reference/standard-sql/navigation_functions#first_value
https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls
I want to clean up data based on the next non-null value.
So if you reverse the order, that's the last non-null value.
If you have multiple columns and the logic is too cumbersome to write in SQL, you can write it in plpgsql instead, or even use the script language of your choice (but that will be slower).
The idea is to open a cursor for update, with an ORDER BY in the reverse order mentioned in the question. Then the plpgsql code stores the last non-null values in variables, and if needed issues an UPDATE WHERE CURRENT OF cursor to replace the nulls in the table with desired values.
This may take a while, and the numerous updates will take a lot of locks. It looks like your data can be processed in independent chunks using the "id" column as chunk identifier, so it would be a good idea to use that.

Update null values in a column based on non null values percentage of the column

I need to update the null values of a column in a table for each category based on the percentage of the non-null values. The following table shows the null values for a particular category -
There are only two types of values in the column. The percentage of types based on rows is -
The number of rows with null values is 7, I need to randomly populate the null values based on the percentage share of the non-null values as shown below - 38%(CV) of 7 = 3, 63%(NCV) of 7 = 4
If you want to dynamically calculate the "NULL rate", one way to do it could be:
with pcts as (
select
(select count(*)::numeric from the_table where type = 'cv') / (select count(*) from the_table where type is not null) as cv_pct,
(select count(*)::numeric from the_table where type = 'ncv') / (select count(*) from the_table where type is not null) as ncv_pct,
(select count(*) from the_table where type is null) as null_count
), calc as (
select d.ctid,
p.cv_pct,
p.ncv_pct,
row_number() over () as rn,
case
when row_number() over () <= round(null_count * p.cv_pct) then 'cv'
else 'ncv'
end as new_type
from the_table d
cross join pcts p
where type is null
)
update the_table t
set type = c.new_type
from calc c
where t.ctid = c.ctid
The first CTE calculates the percentage of each type and the total number of NULL values (in theory the percentage of the NCV type isn't really needed, but I included it for completeness)
The second then calculates for each row which new type should be used. This is done by multiplying the "current" row number with the expected percentage (the CASE expression)
This is then used to update the target table. I have used the ctid as an alternative for a primary key, because your sample data does not have any unique column (or combination of columns). If you do have a primary key that you haven't shown, replace ctid with that primary key column.
I wouldn't be surprised though, if there was a shorter, more efficient way to do it, but for now I can't think of a better alternative.
Online example
If you are on PG11 or later, you can use the groups frame to do this in what should be close to a single pass (except reordering for output when sorted by tid) with window functions:
select tid, category, id, type,
case
when type is not null then type
when round(
(count(*) over (partition by category
order by type nulls last
groups between 2 preceding
and 2 preceding))::numeric /
coalesce(
nullif(
count(*) over (partition by category
order by type nulls last
groups 2 preceding
exclude group), 0), 1
) *
count(*) over (partition by category
order by type nulls last
groups current row)
) >= row_number() over (partition by category, type
order by tid)
then
first_value(type) over (partition by category
order by type nulls last
groups between 2 preceding
and 2 preceding)
else
first_value(type) over (partition by category
order by type nulls last
groups 1 preceding
exclude group)
end as extended_type
from cv_ncv
order by tid;
Working fiddle here.

If there is only one zero value then group by supplier and show zero, if there is no zero, then avg all values

I will give you example of table that I have:
Supplier | Value
sup1 | 4
sup2 | 1
sup1 | 0
sup1 | 3
sup2 | 5
I need a result that will do average by supplier, but if there is value 0 for a supplier, do not average, but return 0 instead
It should look like this:
Supplier | Value
sup1 | 0
sup2 | 3
This is a little trick but it should work :
SELECT Supplier,
CASE WHEN MIN(ABS(Value)) = 0 THEN 0 ELSE AVG(Value) END
FROM TableTest
GROUP BY Supplier
EDIT : Using the ABS() function let you avoid having problems with negative values
DECLARE #TAB TABLE (SUPPLIER VARCHAR(50),VALUE INTEGER)
INSERT INTO #TAB
SELECT 'sup1',4
UNION ALL
SELECT 'sup2',1
UNION ALL
SELECT 'sup1',0
UNION ALL
SELECT 'sup1',3
UNION ALL
SELECT 'sup2',5
SELECT * FROM #TAB
SELECT T1.SUPPLIER,CASE WHEN EXISTS(SELECT 1 FROM #TAB T WHERE T.SUPPLIER = T1.SUPPLIER AND T.VALUE = 0) THEN 0 ELSE AVG(T1.VALUE) END AS VALUE
FROM #TAB T1
GROUP BY T1.SUPPLIER
Result
SUPPLIER VALUE
sup1 0
sup2 3
Using the following query is one of the way to do.
First I push the supplier which has the Value = 0, then based on the result, I will do the remaining calculation and finally using UNION to get the expected result:
DECLARE #ZeroValue TABLE (Supplier VARCHAR (20));
INSERT INTO #ZeroValue (Supplier)
SELECT Supplier FROM TestTable WHERE Value = 0
SELECT Supplier, 0 AS Value FROM #ZeroValue
UNION
SELECT T.Supplier, AVG(T.Value) AS Value
FROM TestTable T
JOIN #ZeroValue Z ON Z.Supplier != T.Supplier
GROUP BY T.Supplier
Schema used for the sample:
CREATE TABLE TestTable (Supplier VARCHAR (20), Value INT);
INSERT INTO TestTable (Supplier, Value) VALUES
('sup1', 4), ('sup2', 1), ('sup1', 0), ('sup1', 3), ('sup2', 5);
Please find the working demo on db<>fiddle

TSQL: Inserting missing records into table

I am stuck at this T-SQL query.
I have table below
Age SectioName Cost
---------------------
1 Section1 100
2 Section1 200
1 Section2 500
3 Section2 100
4 Section2 200
Lets say for each section I can have maximum 5 Age. In above table there are some missing Ages. How do I insert missing Ages for each section. (Possibly without using cursor). The cost would be zero for missing Ages
So after the insertion the table should look like
Age SectioName Cost
---------------------
1 Section1 100
2 Section1 200
3 Section1 0
4 Section1 0
5 Section1 0
1 Section2 500
2 Section2 0
3 Section2 100
4 Section2 200
5 Section2 0
EDIT1
I should have been more clear with my question. The maximum age is dynamic value. It could be 5,6,10 or someother value but it will be always less than 25.
I think I got it
;WITH tally AS
(
SELECT 1 AS r
UNION ALL
SELECT r + 1 AS r
FROM tally
WHERE r < 5 -- this value could be dynamic now
)
select n.r, t.SectionName, 0 as Cost
from (select distinct SectionName from TempFormsSectionValues) t
cross join
(select ta.r FROM tally ta) n
where not exists
(select * from TempFormsSectionValues where YearsAgo = n.r and SectionName = t.SectionName)
order by t.SectionName, n.r
You can use this query to select missing value:
select n.num, t.SectioName, 0 as Cost
from (select distinct SectioName from table1) t
cross join
(select 1 as num union select 2 union select 3 union select 4 union select 5) n
where not exists
(select * from table1 where table1.age = n.num and table1.SectioName = t.SectioName)
It creates a Cartesian product of sections and numbers 1 to 5 and then selects those that doesn't exist yet. You can then use this query for the source of insert into your table.
SQL Fiddle (it has order by added to check the results easier but it's not necessary for inserting).
Use below query to generate missing rows
SELECT t1.Age,t1.Section,ISNULL(t2.Cost,0) as Cost
FROM
(
SELECT 1 as Age,'Section1' as Section,0 as Cost
UNION
SELECT 2,'Section1',0
UNION
SELECT 3,'Section1',0
UNION
SELECT 4,'Section1',0
UNION
SELECT 5,'Section1',0
UNION
SELECT 1,'Section2',0
UNION
SELECT 2,'Section2',0
UNION
SELECT 3,'Section2',0
UNION
SELECT 4,'Section2',0
UNION
SELECT 5,'Section2',0
) as t1
LEFT JOIN test t2
ON t1.Age=t2.Age AND t1.Section=t2.Section
ORDER BY Section,Age
SQL Fiddle
You can utilize above result set for inserting missing rows by using EXCEPT operator to exclude already existing rows in table -
INSERT INTO test
SELECT t1.Age,t1.Section,ISNULL(t2.Cost,0) as Cost
FROM
(
SELECT 1 as Age,'Section1' as Section,0 as Cost
UNION
SELECT 2,'Section1',0
UNION
SELECT 3,'Section1',0
UNION
SELECT 4,'Section1',0
UNION
SELECT 5,'Section1',0
UNION
SELECT 1,'Section2',0
UNION
SELECT 2,'Section2',0
UNION
SELECT 3,'Section2',0
UNION
SELECT 4,'Section2',0
UNION
SELECT 5,'Section2',0
) as t1
LEFT JOIN test t2
ON t1.Age=t2.Age AND t1.Section=t2.Section
EXCEPT
SELECT Age,Section,Cost
FROM test
SELECT * FROM test
ORDER BY Section,Age
http://www.sqlfiddle.com/#!3/d9035/11