Why grouping method sum in slick returns Option even if column used for sum is mandatory column? - scala

CREATE TABLE orders
(
id bigint NOT NULL,
...
created_on date NOT NULL,
quantity int NOT NULL,
...
CONSTRAINT orders_pkey PRIMARY KEY (id)
)
SELECT DATE(o.created_on) AS date, sum(quantity)
FROM orders o
GROUP BY date
ordersItemsQuery.groupBy(_.createdOn).map{
case (created, group) => (created, group.map(_.quantity).sum)
}
notice quantity is not null column, group.map(_.quantity).sum returns Rep[Option[Int]] but not Rep[Int] why?

The Slick method sum evaluates Option[T], and shouldn't be confused with the standard Scala collections method sum that returns a non-optional value.
Slick's sum is optional because a query may produce no results. That is, if you run SELECT SUM(column) FROM table and there are no rows, you do not get back zero from the database. Instead, you get back no rows. Slick is being consistent with this behaviour. Or rather: the sum is happening in SQL, on the database server, and doesn't produce a result when there are no rows.
In contrast to the way a database works, Scala's sum does allow you to sum an empty list (List[Int]().sum) and get back zero.

Related

Computed table column with MAX value between rows containing a shared value

I have the following table
CREATE TABLE T2
( ID_T2 integer NOT NULL PRIMARY KEY,
FK_T1 integer, <--- foreign key to T1(Table1)
FK_DATE date, <--- foreign key to T1(Table1)
T2_DATE date, <--- user input field
T2_MAX_DIFF COMPUTED BY ( (SELECT DATEDIFF (day, MAX(T2_DATE), CURRENT_DATE) FROM T2 GROUP BY FK_T1) )
);
I want T2_MAX_DIFF to display the number of days since last input across all similar entries with a common FK_T1.
It does work, but if another FK_T1 values is added to the table, I'm getting an error about "multiple rows in singleton select".
I'm assuming that I need some sort of WHERE FK_T1 = FK_T1 of corresponding row. Is it possible to add this? I'm using Firebird 3.0.7 with flamerobin.
The error "multiple rows in singleton select" means that a query that should provide a single scalar value produced multiple rows. And that is not unexpected for a query with GROUP BY FK_T1, as it will produce a row per FK_T1 value.
To fix this, you need to use a correlated sub-query by doing the following:
Alias the table in the subquery to disambiguate it from the table itself
Add a where clause, making sure to use the aliased table (e.g. src, and src.FK_T1), and explicitly reference the table itself for the other side of the comparison (e.g. T2.FK_T1)
(optional) remove the GROUP BY clause because it is not necessary given the WHERE clause. However, leaving the GROUP BY in place may uncover certain types of errors.
The resulting subquery then becomes:
(SELECT DATEDIFF (day, MAX(src.T2_DATE), CURRENT_DATE)
FROM T2 src
WHERE src.FK_T1 = T2.FK_T1
GROUP BY src.FK_T1)
Notice the alias src for the table referenced in the subquery, the use of src.FK_T1 in the condition, and the explicit use of the table in T2.FK_T1 to reference the column of the current row of the table itself. If you'd use src.FK_T1 = FK_T1, it would compare with the FK_T1 column of src (as if you'd used src.FK_T1 = src.FK_T2), so that would always be true.
CREATE TABLE T2
( ID_T2 integer NOT NULL PRIMARY KEY,
FK_T1 integer,
FK_DATE date,
T2_DATE date,
T2_MAX_DIFF COMPUTED BY ( (
SELECT DATEDIFF (day, MAX(src.T2_DATE), CURRENT_DATE)
FROM T2 src
WHERE src.FK_T1 = T2.FK_T1
GROUP BY src.FK_T1) )
);

Use sum function in calculated column

Is it possible to use a sum function in a calculated column?
If yes, I would like to create a calculated column, that calculates the sum of a column in the same table where the date is smaller than the date of this entry. is this possible?
And last, would this optimize repeated calls on this value over the exemplified view below?
SELECT ProductGroup, SalesDate, (
SELECT SUM(Sales)
FROM SomeList
WHERE (ProductGroup= KVU.ProductGroup) AND (SalesDate<= KVU.SalesDate)) AS cumulated
FROM SomeList AS KVU
Is it possible to use a sum function in a calculated column?
Yes, it's possible using a scalar valued function (scalar UDF) for you computed column but this would be a disaster. Using scalar UDFs for computed columns destroy performance. Adding a scalar UDF that accesses data (which would be required here) makes things even worse.
It sounds to me like you just need a good ol' fashioned index to speed things up. First some sample data:
IF OBJECT_ID('dbo.somelist','U') IS NOT NULL DROP TABLE dbo.somelist;
GO
CREATE TABLE dbo.somelist
(
ProductGroup INT NOT NULL,
[Month] TINYINT NOT NULL CHECK ([Month] <= 12),
Sales DECIMAL(10,2) NOT NULL
);
INSERT dbo.somelist
VALUES (1,1,22),(2,1,45),(2,1,25),(2,1,19),(1,2,100),(1,2,200),(2,2,50.55);
and the correct index:
CREATE NONCLUSTERED INDEX nc_somelist ON dbo.somelist(ProductGroup,[Month])
INCLUDE (Sales);
With this index in place this query would be extremely efficient:
SELECT s.ProductGroup, s.[Month], SUM(s.Sales)
FROM dbo.somelist AS s
GROUP BY s.ProductGroup, s.[Month];
If you needed to get a COUNT by month & product group you could create an indexed view like so:
CREATE VIEW dbo.vw_somelist WITH SCHEMABINDING AS
SELECT s.ProductGroup, s.[Month], TotalSales = COUNT_BIG(*)
FROM dbo.somelist AS s
GROUP BY s.ProductGroup, s.[Month];
GO
CREATE UNIQUE CLUSTERED INDEX uq_cl__vw_somelist ON dbo.vw_somelist(ProductGroup, [Month]);
Once that indexed view was in place your COUNTs would be pre-aggregated. You cannot, however, include SUM in an indexed view.

How to convert null rows to 0 and sum the entire column using DB2?

I'm using the following query to sum the entire column. In the TOREMOVEALLPRIV column, I have both integer and null values.
I want to sum both null and integer values and print the total sum value.
Here is my query which print the sum values as null:
select
sum(URT.PRODSYS) as URT_SUM_PRODSYS,
sum(URT.Users) as URT_SUM_USERS,
sum(URT.total_orphaned) as URT_SUM_TOTAL_ORPHANED,
sum(URT.Bp_errors) as URT_SUM_BP_ERRORS,
sum(URT.Ma_errors) as URT_SUM_MA_ERRORS,
sum(URT.Pp_errors) as URT_SUM_PP_ERRORS,
sum(URT.REQUIREURTCBN) as URT_SUM_CBNREQ,
sum(URT.REQUIREURTQEV) as URT_SUM_QEVREQ,
sum(URT.REQUIREURTPRIV) as URT_SUM_PRIVREQ,
sum(URT.cbnperf) as URT_SUM_CBNPERF,
sum(URT.qevperf) as URT_SUM_QEVPERF,
sum(URT.privperf) as URT_SUM_PRIVPERF,
sum(URT.TO_REMOVEALLPRIV) as TO_REMOVEALLPRIV_SUM
from
URTCUSTSTATUS URT
inner join CUSTOMER C on URT.customer_id=C.customer_id;
Output image:
Expected Output:
Instead of null, I need to print sum of rows whichever have integers.
The SUM function automatically handles that for you. You said the column had a mix of NULL and numbers; the SUM automatically ignores the NULL values and correctly returns the sum of the numbers. You can read it on IBM Knowledge Center:
The function is applied to the set of values derived from the argument values by the elimination of null values.
Note: All aggregate functions ignore NULL values except the COUNT function. Example: if you have two records with values 5 and NULL, the SUM and AVG functions will both return 5, but the COUNT function will return 2.
However, it seems that you misunderstood why you're getting NULL as a result. It's not because the column contains null values, it's because there are no records selected. That's the only case when the SUM function returns NULL. If you want to return zero in this case, you can use the COALESCE or IFNULL function. Both are the same for this scenario:
COALESCE(sum(URT.TO_REMOVEALLPRIV), 0) as TO_REMOVEALLPRIV_SUM
or
IFNULL(sum(URT.TO_REMOVEALLPRIV), 0) as TO_REMOVEALLPRIV_SUM
I'm guessing that you want to do the same to all other columns in your query, so I'm not sure why you only complained about the TO_REMOVEALLPRIV column.
What you're looking for is the COALESCE function:
select
sum(URT.PRODSYS) as URT_SUM_PRODSYS,
sum(URT.Users) as URT_SUM_USERS,
sum(URT.total_orphaned) as URT_SUM_TOTAL_ORPHANED,
sum(URT.Bp_errors) as URT_SUM_BP_ERRORS,
sum(URT.Ma_errors) as URT_SUM_MA_ERRORS,
sum(URT.Pp_errors) as URT_SUM_PP_ERRORS,
sum(URT.REQUIREURTCBN) as URT_SUM_CBNREQ,
sum(URT.REQUIREURTQEV) as URT_SUM_QEVREQ,
sum(URT.REQUIREURTPRIV) as URT_SUM_PRIVREQ,
sum(URT.cbnperf) as URT_SUM_CBNPERF,
sum(URT.qevperf) as URT_SUM_QEVPERF,
sum(URT.privperf) as URT_SUM_PRIVPERF,
sum(COALESCE(URT.TO_REMOVEALLPRIV,0)) as TO_REMOVEALLPRIV_SUM
from
URTCUSTSTATUS URT
inner join CUSTOMER C on URT.customer_id=C.customer_id;

Conversion failure varchar to int on a column cast as int

I've created a view called vReceivedEmail which includes a varchar column called RowPrimaryKeyValue. This column usually stores primary key id values, such as 567781, but every now and then has the descriptive text 'Ad hoc message'.
I've set the view up so that it only shows records that would hold the primary key values and then CAST the column as int.
SELECT CAST(Email.RowPrimaryKeyValue as int) as DetailID
FROM MessageStore.dbo.Email Email
WHERE ISNUMERIC(Email.RowPrimaryKeyValue) = 1
When I test the view, I only get records with the primary key values I expected, and when I look at the column listing for the view in the object explorer, the column is saved at the int data type.
I've then tried to apply the following WHERE clause in a separate query, referencing my saved view:
SELECT PotAcc.DetailID
FROM PotentialAccounts PotAcc
WHERE PotAcc.DetailID NOT IN (
SELECT RecEmail.DetailID
FROM vReceivedEmail RecEmail
)
...but I'm returning the following error:
Conversion failed when converting the varchar value 'Ad hoc message'
to data type int.
'Ad hoc message' is data from the column I filtered and converted to int, but I don't know how it's managing to trip up on this since the view explcitly filters this out and converts the column to int.
Since you are on SQL Server 2012, how about using try_convert(int,RowPrimaryKeyValue) is not null instead of isnumeric()?
Your view would look like so:
select try_convert(int, RowPrimaryKeyValue) as DetailID
from MessageStore.dbo.Email Email
where try_convert(int, RowPrimaryKeyValue) is not null
In SQL Server 2012 and up: each of these will return null when the conversion fails instead of an error.
try_convert(datatype,val)
try_cast(val as datatype)
try_parse(val as datatype [using culture])
Why doesn’t isnumeric() work correctly? (SQL Spackle)
Isnumeric() can be deceiving... however I'm betting this is due to SQL Server optimization. Though your view works, that doesn't mean the outer query guarantees that only INT is returned. Instead of using a view... use a CTE and try it. And since you are on 2012, this can all be avoided how SqlZim explained above.
WITH CTE AS(
SELECT CAST(Email.RowPrimaryKeyValue as int) as DetailID
FROM MessageStore.dbo.Email Email
WHERE ISNUMERIC(Email.RowPrimaryKeyValue) = 1)
SELECT PotAcc.DetailID
FROM PotentialAccounts PotAcc
WHERE PotAcc.DetailID NOT IN (
SELECT RecEmail.DetailID
FROM CTE RecEmail
)

an empty row with null-like values in not-null field

I'm using postgresql 9.0 beta 4.
After inserting a lot of data into a partitioned table, i found a weird thing. When I query the table, i can see an empty row with null-like values in 'not-null' fields.
That weird query result is like below.
689th row is empty. The first 3 fields, (stid, d, ticker), are composing primary key. So they should not be null. The query i used is this.
select * from st_daily2 where stid=267408 order by d
I can even do the group by on this data.
select stid, date_trunc('month', d) ym, count(*) from st_daily2
where stid=267408 group by stid, date_trunc('month', d)
The 'group by' results still has the empty row.
The 1st row is empty.
But if i query where 'stid' or 'd' is null, then it returns nothing.
Is this a bug of postgresql 9b4? Or some data corruption?
EDIT :
I added my table definition.
CREATE TABLE st_daily
(
stid integer NOT NULL,
d date NOT NULL,
ticker character varying(15) NOT NULL,
mp integer NOT NULL,
settlep double precision NOT NULL,
prft integer NOT NULL,
atr20 double precision NOT NULL,
upd timestamp with time zone,
ntrds double precision
)
WITH (
OIDS=FALSE
);
CREATE TABLE st_daily2
(
CONSTRAINT st_daily2_pk PRIMARY KEY (stid, d, ticker),
CONSTRAINT st_daily2_strgs_fk FOREIGN KEY (stid)
REFERENCES strgs (stid) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT st_daily2_ck CHECK (stid >= 200000 AND stid < 300000)
)
INHERITS (st_daily)
WITH (
OIDS=FALSE
);
The data in this table is simulation results. Multithreaded multiple simulation engines written in c# insert data into the database using Npgsql.
psql also shows the empty row.
You'd better leave a posting at http://www.postgresql.org/support/submitbug
Some questions:
Could you show use the table
definitions and constraints for the
partions?
How did you load your data?
You get the same result when using
another tool, like psql?
The answer to your problem may very well lie in your first sentence:
I'm using postgresql 9.0 beta 4.
Why would you do that? Upgrade to a stable release. Preferably the latest point-release of the current version.
This is 9.1.4 as of today.
I got to the same point: "what in the heck is that blank value?"
No, it's not a NULL, it's a -infinity.
To filter for such a row use:
WHERE
case when mytestcolumn = '-infinity'::timestamp or
mytestcolumn = 'infinity'::timestamp
then NULL else mytestcolumn end IS NULL
instead of:
WHERE mytestcolumn IS NULL