How to replace nulls with zeros in postgresql crosstabs - postgresql

I've a product table with product_id and 100+ attributes. The product_id is text whereas the attribute columns are integer, i.e. 1 if the attribute exists. When the Postgresql crosstab is run, non-matching atrributes return null values. How do I replace nulls with zeros instead.
SELECT ct.*
INTO ct3
FROM crosstab(
'SELECT account_number, attr_name, sub FROM products ORDER BY 1,2',
'SELECT DISTINCT attr_name FROM attr_names ORDER BY 1')
AS ct(
account_number text,
Attr1 integer,
Attr2 integer,
Attr3 integer,
Attr4 integer,
...
)
Replace this result:
account_number Attr1 Attr2 Attr3 Attr4
1.00000001 1 null null null
1.00000002 null null 1 null
1.00000003 null null 1 null
1.00000004 1 null null null
1.00000005 1 null null null
1.00000006 null null null 1
1.00000007 1 null null null
with this below:
account_number Attr1 Attr2 Attr3 Attr4
1.00000001 1 0 0 0
1.00000002 0 0 1 0
1.00000003 0 0 1 0
1.00000004 1 0 0 0
1.00000005 1 0 0 0
1.00000006 0 0 0 1
1.00000007 1 0 0 0
A workaround would be to do a select account_number, coalesce(Attr1,0)... on the result. But typing out coalesce for each of the 100+ columns is rather unyieldly. Is there a way to handle this using crosstab? Thanks

You can use coalesce:
select account_number,
coalesce(Attr1, 0) as Attr1,
coalesce(Attr2, 0) as Attr2,
etc

if you can put those Attrs into a table like
attr
-----
Attr1
Attr2
Attr3
...
then you could automatically generate the repeating coalesce statement like
SELECT 'coalesce("' || attr || '", 0) "'|| attr ||'",' from table;
to save some typing.

Related

Create 2 new conditional columns with dependence postgresql

I want to create two new columns from a query in postgresql, one depending on existing data, and the other depnding on the new column, i.e
existing_col new_col new_col2
a 1 2
b 0 0
I have tried:
select existing_col,
case when existing_col like 'a' then 1 else 0 end as new_col
case when new_col like 1 then 2 else 0 end as new_col2
from table
however this is giving me the error that new_col doesn't exist, how can I achieve this?
updated:
(I modified your qry a little no avoid like operator for integers)
t=# create table "table" (existing_col text);
CREATE TABLE
Time: 50.189 ms
t=# insert into "table" values('a'),('b');
INSERT 0 2
Time: 0.911 ms
t=# select *,case when new_col like 1 then 2 else 0 end as new_col2
t-# from (
t(# select existing_col,
t(# case when existing_col like 'a' then 1 else 0 end as new_col
t(# from "table") al
t-# ;
ERROR: operator does not exist: integer ~~ integer
LINE 1: select *,case when new_col like 1 then 2 else 0 end as new_c...
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
Time: 0.514 ms
t=# select *,case when new_col = 1 then 2 else 0 end as new_col2
t-# from (
t(# select existing_col,
t(# case when existing_col like 'a' then 1 else 0 end as new_col
t(# from "table") al
t-# ;
existing_col | new_col | new_col2
--------------+---------+----------
a | 1 | 2
b | 0 | 0
(2 rows)
Time: 0.347 ms
as in docs:
CASE WHEN condition THEN result
[WHEN ...]
[ELSE result]
END

T-SQL: Timeseries filling ranges

I have this dataset where I have a time-series with in the YYYYMM format. I have two columns which basically as true/false flags. I would like to add two extra columns based on these true/false flags that retrieves the current range:
Default Cure
201301 0 NULL
201302 0 NULL
201303 0 NULL
201304 1 NULL
201305 1 NULL
201306 1 NULL
201307 1 NULL
201308 NULL 0
201309 NULL 0
201310 NULL 1
201311 0 NULL
201312 0 NULL
201401 0 NULL
201402 0 NULL
201403 1 NULL
201404 1 NULL
201405 0 NULL
201406 0 NULL
201407 NULL 1
201408 NULL 0
201409 NULL 0
201410 NULL 0
201411 NULL 0
201412 NULL 0
I this dataset you can see the Default column being set to 1 for the periods 201304, 05, 06, 07 and the Cure column is set to 1 in the period 201310.
This basically means the Default timeseries is valid from period 201304 until period 201310. Ultimately I would like to generate the following set:
Default Cure DefaultPeriod CurePeriod
201301 0 NULL NULL NULL
201302 0 NULL NULL NULL
201303 0 NULL NULL NULL
201304 1 NULL 201304 201310
201305 1 NULL 201304 201310
201306 1 NULL 201304 201310
201307 1 NULL 201304 201310
201308 NULL 0 201304 201310
201309 NULL 0 201304 201310
201310 NULL 1 201304 201310
201311 0 NULL NULL NULL
201312 0 NULL NULL NULL
201401 0 NULL NULL NULL
201402 0 NULL NULL NULL
201403 1 NULL 201403 201407
201404 1 NULL 201403 201407
201405 0 NULL 201403 201407
201406 0 NULL 201403 201407
201407 NULL 1 201403 201407
201408 NULL 0 NULL NULL
201409 NULL 0 NULL NULL
201410 NULL 0 NULL NULL
201411 NULL 0 NULL NULL
201412 NULL 0 NULL NULL
Multiple ranges can occur but they cannot overlap. How would I go about achieving this. I have tried to do all sorts of min/max period join on the same table, but I can't seem to find a working solution.
This was a real thinker :)
Basically I am dividing up the data on the "Cure" dates (c1), numbering each group(c2), then looking for mins and maxes within each group (c3 C4), then applying some logic to filter out the rows that come before the min.
declare #t table
(
[Month] varchar(6),
[Default] bit,
[Cure] bit
);
insert into #t values('201301', 0, NULL);
insert into #t values('201302', 0, NULL);
insert into #t values('201303', 0, NULL);
insert into #t values('201304', 1, NULL);
insert into #t values('201305', 1, NULL);
insert into #t values('201306', 1, NULL);
insert into #t values('201307', 1, NULL);
insert into #t values('201308', NULL, 0);
insert into #t values('201309', NULL, 0);
insert into #t values('201310', NULL, 1);
insert into #t values('201311', 0, NULL);
insert into #t values('201312', 0, NULL);
insert into #t values('201401', 0, NULL);
insert into #t values('201402', 0, NULL);
insert into #t values('201403', 1, NULL);
insert into #t values('201404', 1, NULL);
insert into #t values('201405', 0, NULL);
insert into #t values('201406', 0, NULL);
insert into #t values('201407', NULL, 1);
insert into #t values('201408', NULL, 0);
insert into #t values('201409', NULL, 0);
insert into #t values('201410', NULL, 0);
insert into #t values('201411', NULL, 0);
insert into #t values('201412', NULL, 0);
with c1 as
(
select min([Month]) [Month], 1 x from #t
union all
select [Month],1 from #t
where Cure = 1
),
c2 as
(
select t.[Month],[Default],[Cure],
sum(x) over (order by t.[Month] rows between unbounded preceding and 1 preceding) grp
from #t t
left outer join c1 on c1.[Month] = t.[Month]
),
c3 as
(
select grp, min([Month]) [Month]
from c2
where [Default] = 1
group by grp
),
c4 as
(
select grp, max([Month]) [Month]
from c2
where [Cure] = 1
group by grp
)
select c2.[Month], c2.[Default], c2.[Cure],
case when c2.[Month] >= c3.[Month] then c3.[Month] else null end as DefaultPeriod,
case when c2.[Month] >= c3.[Month] then c4.[Month] else null end as CurePeriod
from c2
left outer join c3 on c2.grp = c3.grp
left outer join c4 on c2.grp = c4.grp

How to Pivot on caption?

I am trying to pivot rows into columns with Tsql and also eliminate Nulls. How do I do this? My current query:
IF OBJECT_ID(N'tempdb..#test_data') IS NOT NULL drop table #test_data
create table #test_data (
question_caption varchar(max),
[0] varchar(max),
[1] varchar(max),
[2] varchar(max),
[3] varchar(max))
insert #test_data values('q1','abc',Null,Null,Null)
insert #test_data values('q2',Null,'def',Null,Null)
insert #test_data values('q3',Null,Null,'ghi',Null)
insert #test_data values('q4',Null,Null,Null,'jkl')
select * from #test_data
pivot (
Max([0])
For question_caption in ([0],[1],[2],[3])
) as PivotTable
Output:
question_caption 0 1 2 3
q1 abc NULL NULL NULL
q2 NULL def NULL NULL
q3 NULL NULL ghi NULL
q4 NULL NULL NULL jkl
What I want:
q1 q2 q3 q4
abc def ghi jkl
How can I achieve this? The above query has the error:
Msg 265, Level 16, State 1, Line 4
The column name "0" specified in the PIVOT operator conflicts with the existing column name in the PIVOT argument.
I have tried multiple Pivot examples, but all of them have resulted in one error or another.
You can do with a simple max case:
select [q1]=max(case when question_caption = 'q1' then [0] else null end),
[q2]=max(case when question_caption = 'q2' then [1] else null end),
[q3]=max(case when question_caption = 'q3' then [2] else null end),
[q4]=max(case when question_caption = 'q4' then [3] else null end)
from #test_data
or the pivot:
select [q1], [q2], [q3], [q4]
from ( select question_caption,
coalesce([0],[1],[2],[3])
from #test_data
) s (c, v)
pivot (max(v) for c in ([q1], [q2], [q3], [q4])) p

Select only the column that is not null or one specific one if both are not null

In the following table, I have a column called ShortDesc and one called LongDesc. If the ShortDesc is not null, I want to return this value. If the ShortDesc column in a row is null, I want to return the value of the LongDesc. If both the ShortDesc and LongDesc are not null, I only want to return the ShortDesc (the LongDesc needs to be returned as null).
Table Events
ID ShortDesc LongDesc
0 abc null
1 null def
2 ghi jkl
Result:
ID ShortDesc LongDesc
0 abc null
1 null def
2 ghi null
I'm at a loss how to create the SQL for this.
If you want to show both shortDesc and longDesc:
SELECT
shortDesc,
CASE WHEN shortDesc IS NOT NULL THEN NULL ELSE longDesc END AS longDesc
FROM yourTable;
If you just want to show a single desc:
SELECT COALESCE(shortDesc, longDesc) AS desc
FROM yourTable;

How to design T-SQL query to calculate sum in one pass?

I am trying to develop a T-SQL query which will do the following:
ROUND(100 * A / B, 1)
Simple in concept, but it's tricky because of possible B=0 denominator and also because of A and B variables. What I expect is a percent value like 93.2 (given in this format without %). Or even 932 would be acceptable since I could convert it later.
But instead, I'm currently getting 151, which is the number of records.
A = CASE WHEN A.MFG IS NULL AND A.MFG2 IS NULL AND A.QC IS NULL AND A.QC2 IS NULL THEN 1 ELSE 0 END
B = CASE WHEN [Date_Completed] IS NOT NULL THEN 1 ELSE 0 END
My current logic only divides A/B if B is not equal to zero. Can you please help me fix this? p.s. all fields above are from the same table A.
I tried:
SELECT CASE WHEN t.VarB<>0 THEN ROUND(100 * t.VarA / t.VarB, 1)
ELSE 0 /* or whatever you'd want to return in this case */
END
FROM (SELECT CASE WHEN A.MFG IS NULL AND A.MFG2 IS NULL AND A.QC IS NULL AND A.QC2 IS NULL THEN 1
ELSE 0
END AS VarA,
CASE WHEN [Date_Completed] IS NOT NULL THEN 1
ELSE 0
END AS VarB
FROM EXCEL.Batch_Records A) t
But I got 33000 rows returned instead of just one, where each row = 100 or 0.
Good idea, Conrad! I tested your solution and it works if I just want that one value. But what I didn't tell you was that there are additional values I need returned from same query. When I tried adding in the other value calculations, I got syntax errors. So here is my current query. How should htis be rewritten please?
select
SUM(CASE WHEN A.DATE_RECEIVED IS NOT NULL THEN 1 ELSE 0 END) AS NUM_RECEIVED,
SUM(CASE WHEN [Date_Completed] IS NOT NULL THEN 1 ELSE 0 END) AS NUM_COMPLETE_OF_OPENED,
SUM(CASE WHEN A.DATE_COMPLETED IS NOT NULL THEN 1 ELSE 0 END) AS NUM_COMPLETED_IN_MONTH,
SUM(CASE WHEN A.MFG IS NULL AND A.MFG2 IS NULL AND A.QC IS NULL AND A.QC2 IS NULL THEN 1 ELSE 0 END) AS NUM_WITHOUT_ERROR,
round(100 * a/b , 1)
from
(select
sum(CASE
WHEN A.MFG IS NULL AND A.MFG2 IS NULL AND A.QC IS NULL AND A.QC2 IS NULL THEN
1.0
ELSE 0.0 END) A,
sum(CASE WHEN [Date_Completed] IS NOT NULL THEN
1.0 ELSE 0.0 END) B
FROM EXCEL.Batch_Records a
LEFT JOIN EXCEL.QC_CODES d ON a.Part_Number = d.CODE_ID
WHERE (a.[Group] = #GROUP or #GROUP = '' OR #GROUP IS NULL) AND A.Date_Received >= #STARTDATE AND A.Date_Received <= #ENDDATE
Conrad correctly advised me that #TEMP1 was an empty table. But now I populated it and successfully designed this query with his help:
SET #STARTDATE = '1/1/11'
SET #ENDDATE = '1/31/11'
SET #GROUP = 'INTERMEDIATES_FISH'
--SET #TABLE_TITLE = 'BATCH RECORD SUCCESS RATE'
--SET #DEPT = 'QC'
IF EXISTS(SELECT * FROM TEMPDB.INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME LIKE '#TEMP1%')
DROP TABLE #TEMP1
--CREATE TABLE #TEMP1 ( MFG int , MFG2 int , QC int, QC2 INT , [Group] NVARCHAR(MAX), [Date_Completed] datetime, Date_Received datetime)
SELECT
MFG, MFG2, QC, QC2, [GROUP], [DATE_COMPLETED], [DATE_RECEIVED]
INTO #TEMP1
FROM EXCEL.Batch_Records a
WHERE (a.[Group] = #GROUP or #GROUP = '' OR #GROUP IS NULL) AND A.Date_Received >= #STARTDATE AND A.Date_Received <= #ENDDATE
------------------------------------------
;WITH CTE AS
(
SELECT
CASE
WHEN A.MFG IS NULL AND A.MFG2 IS NULL AND A.QC IS NULL AND A.QC2 IS NULL THEN
1.0
ELSE 0.0 END A,
CASE WHEN [Date_Completed] IS NOT NULL THEN 1.0 ELSE 0.0 END B,
CASE WHEN A.Date_Received IS NOT NULL THEN 1 ELSE 0 END NUM_RECEIVED,
CASE WHEN [Date_Completed] IS NOT NULL THEN 1 ELSE 0 END NUM_COMPLETE_OF_OPENED,
CASE WHEN A.DATE_COMPLETED IS NOT NULL THEN 1 ELSE 0 END NUM_COMPLETED_IN_MONTH,
CASE WHEN A.MFG IS NULL AND A.MFG2 IS NULL AND A.QC IS NULL AND A.QC2 IS NULL THEN 1 ELSE 0 END AS NUM_WITHOUT_ERROR
FROM
#TEMP1 a
--WHERE (a.[Group] = #GROUP or #GROUP = '' OR #GROUP IS NULL) AND A.Date_Received >= #STARTDATE AND A.Date_Received <= #ENDDATE
)
select
round(100 * SUM(A)/SUM(b) , 1) ,
SUM(NUM_RECEIVED) NUM_RECEIVED,
SUM(NUM_COMPLETE_OF_OPENED) NUM_COMPLETE_OF_OPENED,
SUM(NUM_COMPLETED_IN_MONTH) NUM_COMPLETED_IN_MONTH,
SUM(NUM_WITHOUT_ERROR) NUM_WITHOUT_ERROR
FROM CTE
Basically you need to use SUM() to get the sum. You should also use 1.0 and 0.0 so you get decimal values.
You should also do the SUM before the Division
UPDATE
Since you're adding in a number of SUM(CASE statements its probably more readable to move the CASE statments out to a CTE.
CREATE TABLE #Batch_Records (
MFG int ,
MFG2 int ,
QC int,
QC2 INT ,
[Group] int,
[Date_Completed] datetime,
Date_Received datetime)
INSERT INTO #Batch_Records (MFG , MFG2 , QC , QC2 , [Group] , [Date_Completed] , Date_Received )
VALUES (1,null,null,null,1,'1/4/2011','2/4/2011'),
(null,null,null,null,1,'2/2/2011','3/4/2011'),
(1,null,null,null,1,'3/6/2011','4/3/2011'),
(null,null,null,null,1,NULL,'5/4/2011'),
(1,null,null,null,1,'5/4/2011','6/6/2011'),
(1,null,null,null,1,NULL,'7/4/2011')
DECLARE #GROUP int
DECLARE #STARTDATE DateTime
DECLARE #ENDDATE DateTime
SET #GROUP = 1
SET #STARTDATE = '1/1/2001'
SET #ENDDATE = '1/1/2012'
;WITH CTE AS
(
SELECT
CASE
WHEN A.MFG IS NULL AND A.MFG2 IS NULL AND A.QC IS NULL AND A.QC2 IS NULL THEN
1.0
ELSE 0.0 END A,
CASE WHEN [Date_Completed] IS NOT NULL THEN
1.0 ELSE 0.0 END B,
CASE WHEN A.Date_Received IS NOT NULL THEN 1 ELSE 0 END NUM_RECEIVED,
CASE WHEN [Date_Completed] IS NOT NULL THEN 1 ELSE 0 END NUM_COMPLETE_OF_OPENED,
CASE WHEN A.DATE_COMPLETED IS NOT NULL THEN 1 ELSE 0 END NUM_COMPLETED_IN_MONTH,
CASE WHEN A.MFG IS NULL AND A.MFG2 IS NULL AND A.QC IS NULL AND A.QC2 IS NULL THEN 1 ELSE 0 END AS NUM_WITHOUT_ERROR
FROM
#Batch_Records a
WHERE
(a.[Group] = #GROUP or #GROUP = '' OR #GROUP IS NULL)
AND A.Date_Received >= #STARTDATE AND A.Date_Received <= #ENDDATE
)
select
round(100 * SUM(A)/SUM(b) , 1) ,
SUM(NUM_RECEIVED) NUM_RECEIVED,
SUM(NUM_COMPLETE_OF_OPENED) NUM_COMPLETE_OF_OPENED,
SUM(NUM_COMPLETED_IN_MONTH) NUM_COMPLETED_IN_MONTH,
SUM(NUM_WITHOUT_ERROR) NUM_WITHOUT_ERROR
FROM CTE
DROP TABLE #Batch_Records