How to convert null rows to 0 and sum the entire column using DB2? - db2

I'm using the following query to sum the entire column. In the TOREMOVEALLPRIV column, I have both integer and null values.
I want to sum both null and integer values and print the total sum value.
Here is my query which print the sum values as null:
select
sum(URT.PRODSYS) as URT_SUM_PRODSYS,
sum(URT.Users) as URT_SUM_USERS,
sum(URT.total_orphaned) as URT_SUM_TOTAL_ORPHANED,
sum(URT.Bp_errors) as URT_SUM_BP_ERRORS,
sum(URT.Ma_errors) as URT_SUM_MA_ERRORS,
sum(URT.Pp_errors) as URT_SUM_PP_ERRORS,
sum(URT.REQUIREURTCBN) as URT_SUM_CBNREQ,
sum(URT.REQUIREURTQEV) as URT_SUM_QEVREQ,
sum(URT.REQUIREURTPRIV) as URT_SUM_PRIVREQ,
sum(URT.cbnperf) as URT_SUM_CBNPERF,
sum(URT.qevperf) as URT_SUM_QEVPERF,
sum(URT.privperf) as URT_SUM_PRIVPERF,
sum(URT.TO_REMOVEALLPRIV) as TO_REMOVEALLPRIV_SUM
from
URTCUSTSTATUS URT
inner join CUSTOMER C on URT.customer_id=C.customer_id;
Output image:
Expected Output:
Instead of null, I need to print sum of rows whichever have integers.

The SUM function automatically handles that for you. You said the column had a mix of NULL and numbers; the SUM automatically ignores the NULL values and correctly returns the sum of the numbers. You can read it on IBM Knowledge Center:
The function is applied to the set of values derived from the argument values by the elimination of null values.
Note: All aggregate functions ignore NULL values except the COUNT function. Example: if you have two records with values 5 and NULL, the SUM and AVG functions will both return 5, but the COUNT function will return 2.
However, it seems that you misunderstood why you're getting NULL as a result. It's not because the column contains null values, it's because there are no records selected. That's the only case when the SUM function returns NULL. If you want to return zero in this case, you can use the COALESCE or IFNULL function. Both are the same for this scenario:
COALESCE(sum(URT.TO_REMOVEALLPRIV), 0) as TO_REMOVEALLPRIV_SUM
or
IFNULL(sum(URT.TO_REMOVEALLPRIV), 0) as TO_REMOVEALLPRIV_SUM
I'm guessing that you want to do the same to all other columns in your query, so I'm not sure why you only complained about the TO_REMOVEALLPRIV column.

What you're looking for is the COALESCE function:
select
sum(URT.PRODSYS) as URT_SUM_PRODSYS,
sum(URT.Users) as URT_SUM_USERS,
sum(URT.total_orphaned) as URT_SUM_TOTAL_ORPHANED,
sum(URT.Bp_errors) as URT_SUM_BP_ERRORS,
sum(URT.Ma_errors) as URT_SUM_MA_ERRORS,
sum(URT.Pp_errors) as URT_SUM_PP_ERRORS,
sum(URT.REQUIREURTCBN) as URT_SUM_CBNREQ,
sum(URT.REQUIREURTQEV) as URT_SUM_QEVREQ,
sum(URT.REQUIREURTPRIV) as URT_SUM_PRIVREQ,
sum(URT.cbnperf) as URT_SUM_CBNPERF,
sum(URT.qevperf) as URT_SUM_QEVPERF,
sum(URT.privperf) as URT_SUM_PRIVPERF,
sum(COALESCE(URT.TO_REMOVEALLPRIV,0)) as TO_REMOVEALLPRIV_SUM
from
URTCUSTSTATUS URT
inner join CUSTOMER C on URT.customer_id=C.customer_id;

Related

Replacing null values by average of values grouped by concatenated categories in Teradata

Suppose that I have a lot of NULL values (missing values) in a column named 'score'. I want to replace them by a specific average not from all the values of the column 'score' but by groups that I built with a crosscategory from two concatenated categories:
This kind of query works for getting averages by groups:
SELECT
category1 || ' > ' || category2 AS crosscategory,
ROUND(CAST(AVG(score) AS FLOAT), 2) AS score_avg
FROM DatabaseName.TableName
GROUP BY crosscategory
ORDER BY score_avg;
This one works to replace NULL values by a constant:
SELECT
NVL(score, 0) AS score_without_missing_values
FROM DatabaseName.TableName
The problem that I cannot solve now is how to articulate the replacement of NULL values with a constant here the averages computed with the functions AVG and GROUP BY.
Thank you very much for your help!
Seems you want a Group Average:
SELECT
t.*,
coalesce(score, AVG(score) OVER (PARTITION BY category1, category2)) AS score_avg
FROM DatabaseName.TableName AS t
I removed the ROUND/CAST, because AVG returns FLOAT by default and ROUND in probably not needed (if you need it, you might better cast to a DECIMAL).

Why grouping method sum in slick returns Option even if column used for sum is mandatory column?

CREATE TABLE orders
(
id bigint NOT NULL,
...
created_on date NOT NULL,
quantity int NOT NULL,
...
CONSTRAINT orders_pkey PRIMARY KEY (id)
)
SELECT DATE(o.created_on) AS date, sum(quantity)
FROM orders o
GROUP BY date
ordersItemsQuery.groupBy(_.createdOn).map{
case (created, group) => (created, group.map(_.quantity).sum)
}
notice quantity is not null column, group.map(_.quantity).sum returns Rep[Option[Int]] but not Rep[Int] why?
The Slick method sum evaluates Option[T], and shouldn't be confused with the standard Scala collections method sum that returns a non-optional value.
Slick's sum is optional because a query may produce no results. That is, if you run SELECT SUM(column) FROM table and there are no rows, you do not get back zero from the database. Instead, you get back no rows. Slick is being consistent with this behaviour. Or rather: the sum is happening in SQL, on the database server, and doesn't produce a result when there are no rows.
In contrast to the way a database works, Scala's sum does allow you to sum an empty list (List[Int]().sum) and get back zero.

Use sum function in calculated column

Is it possible to use a sum function in a calculated column?
If yes, I would like to create a calculated column, that calculates the sum of a column in the same table where the date is smaller than the date of this entry. is this possible?
And last, would this optimize repeated calls on this value over the exemplified view below?
SELECT ProductGroup, SalesDate, (
SELECT SUM(Sales)
FROM SomeList
WHERE (ProductGroup= KVU.ProductGroup) AND (SalesDate<= KVU.SalesDate)) AS cumulated
FROM SomeList AS KVU
Is it possible to use a sum function in a calculated column?
Yes, it's possible using a scalar valued function (scalar UDF) for you computed column but this would be a disaster. Using scalar UDFs for computed columns destroy performance. Adding a scalar UDF that accesses data (which would be required here) makes things even worse.
It sounds to me like you just need a good ol' fashioned index to speed things up. First some sample data:
IF OBJECT_ID('dbo.somelist','U') IS NOT NULL DROP TABLE dbo.somelist;
GO
CREATE TABLE dbo.somelist
(
ProductGroup INT NOT NULL,
[Month] TINYINT NOT NULL CHECK ([Month] <= 12),
Sales DECIMAL(10,2) NOT NULL
);
INSERT dbo.somelist
VALUES (1,1,22),(2,1,45),(2,1,25),(2,1,19),(1,2,100),(1,2,200),(2,2,50.55);
and the correct index:
CREATE NONCLUSTERED INDEX nc_somelist ON dbo.somelist(ProductGroup,[Month])
INCLUDE (Sales);
With this index in place this query would be extremely efficient:
SELECT s.ProductGroup, s.[Month], SUM(s.Sales)
FROM dbo.somelist AS s
GROUP BY s.ProductGroup, s.[Month];
If you needed to get a COUNT by month & product group you could create an indexed view like so:
CREATE VIEW dbo.vw_somelist WITH SCHEMABINDING AS
SELECT s.ProductGroup, s.[Month], TotalSales = COUNT_BIG(*)
FROM dbo.somelist AS s
GROUP BY s.ProductGroup, s.[Month];
GO
CREATE UNIQUE CLUSTERED INDEX uq_cl__vw_somelist ON dbo.vw_somelist(ProductGroup, [Month]);
Once that indexed view was in place your COUNTs would be pre-aggregated. You cannot, however, include SUM in an indexed view.

Update with ISNULL and operation

original query looks like this :
UPDATE reponse_question_finale t1, reponse_question_finale t2 SET
t1.nb_question_repondu = (9-(ISNULL(t1.valeur_question_4)+ISNULL(t1.valeur_question_6)+ISNULL(t1.valeur_question_7)+ISNULL(t1.valeur_question_9))) WHERE t1.APPLICATION = t2.APPLICATION;
I know you cannot update 2 tables in a single query so i tried this :
UPDATE reponse_question_finale t1
SET nb_question_repondu = (9-(COALESCE(t1.valeur_question_4,'')::int+COALESCE(t1.valeur_question_6,'')::int+COALESCE(t1.valeur_question_7)::int+COALESCE(t1.valeur_question_9,'')::int))
WHERE t1.APPLICATION = t1.APPLICATION;
But this query gaves me an error : invalid input syntax for integer: ""
I saw that the Postgres equivalent to MySQL is COALESCE() so i think i'm on the good way here.
I also know you cannot add varchar to varchar so i tried to cast it to integer to do that. I'm not sure if i casted it correctly with parenthesis at the good place and regarding to error maybe i cannot cast to int with coalesce.
Last thing, i can certainly do a co-related sub-select to update my two tables but i'm a little lost at this point.
The output must be an integer matching the number of questions answered to a backup survey.
Any thoughts?
Thanks.
coalesce() returns the first non-null value from the list supplied. So, if the column value is null the expression COALESCE(t1.valeur_question_4,'') returns an empty string and that's why you get the error.
But it seems you want something completely different: you want check if the column is null (or empty) and then subtract a value if it is to count the number of non-null columns.
To return 1 if a value is not null or 0 if it isn't you can use:
(nullif(valeur_question_4, '') is null)::int
nullif returns null if the first value equals the second. The IS NULL condition returns a boolean (something that MySQL doesn't have) and that can be cast to an integer (where false will be cast to 0 and true to 1)
So the whole expression should be:
nb_question_repondu = 9 - (
(nullif(t1.valeur_question_4,'') is null)::int
+ (nullif(t1.valeur_question_6,'') is null)::int
+ (nullif(t1.valeur_question_7,'') is null)::int
+ (nullif(t1.valeur_question_9,'') is null)::int
)
Another option is to unpivot the columns and do a select on them in a sub-select:
update reponse_question_finale
set nb_question_repondu = (select count(*)
from (
values
(valeur_question_4),
(valeur_question_6),
(valeur_question_7),
(valeur_question_9)
) as t(q)
where nullif(trim(q),'') is not null);
Adding more columns to be considered is quite easy then, as you just need to add a single line to the values() clause

Create function to compute an average of 3 values

I'm trying to write a function in postgre sql to take an average across three columns. I have written the following function:
create function xcol_avg (col1, col2, col3)
returns numeric as $$
begin
return (coalesce(col1, 0) + coalesce(col2,0) +coalesce(col3, 0))/
case when (col 1 is null or col1 = 0 then 0 else 1 end +
case when (col 2 is null or col2 = 0 then 0 else 1 end +
case when (col 3 is null or col3 = 0 then 0 else 1 end;
end
What is the problem with my code? Also, is there a way to get the function to return null if it ends up dividing by 0? Any help is really appreciated.
Thanks!
Actually, you can make a function that will use a variable number of arguments and depending on their number compute the average. In Postgres there's a word VARIADIC for such things:
SQL functions can be declared to accept variable numbers of arguments, so long as all the "optional" arguments are of the same data type
Function code:
CREATE FUNCTION xcol_avg(numeric, VARIADIC numeric[])
RETURNS numeric
LANGUAGE plpgsql
IMMUTABLE
AS $$
BEGIN
RETURN (SELECT AVG(vals) FROM unnest($2 || ARRAY[$1]) t(vals));
END;
$$;
Use case with different number of arguments:
select xcol_avg(1,6); -- returns 3.5
select xcol_avg(1,5.5,4); -- returns 3.5
select xcol_avg(1,2,3,4,5,6,7); -- returns 4
Click on this Button to try this online.
Explanation:
Marking a function as IMMUTABLE improves the execution time by allowing the optimizer to pre-evaluate the function. Immutable functions cannot modify the database and are guaranteed to always return the same results when called with the same input.
Declaring the last parameter of a function as VARIADIC which has to be of an array type lets you provide optional arguments that will be passed to the function as an array. Note that you don't explicitly write the array, you just list your parameters as you normally would.
unnest() is a function that returns a set of rows by expanding an array. In other words it's "unpacking" the array elements into separate rows
|| is an array operator that provides the array-to-array concatenation. Here it serves the purpose of connecting the first (required) argument with the rest given in a VARIADIC array.
AVG() is an aggregate function that computes an average of all input values. In our case it would take "unpacked" rows from a column named vals and compute the average.
With this solution you don't need to worry about dividing by zero, as at least one argument is required and avg() is doing the job you wanted to do manually by building up the denominator.
Apply it in a query:
This function would also work for computing an average of multiple columns in a row. Consider a table tbl with columns name, cost1, cost2, cost3 and below statement:
SELECT
name, cost1, cost2, cost3,
xcol_avg(cost1, cost2, cost3) AS average_cost
FROM tbl
For more general information about CREATE FUNCTION check the resourceful documentation.