Sum of sum of column over() - tsql

I'm basically trying to understand what the following query would return(and most importantly why):
SELECT SUM(SUM(column)) OVER() FROM table
In practice it returns one row with a sum which is actually the sum of the column over the whole result-set of the table. I don't get why we get this result though!

These will return the same value. Having them together like this is redundant. The innermost SUM will sum all row values, so the outermost SUM has nothing left to sum. You can look at the query plan and you will see that one of the aggregations is empty.
SELECT SUM(SUM(column)) OVER() FROM table
SELECT SUM(column) FROM table

Related

Postgres: counting records in two groups (existing foreign key or null)

I have a table items and a table batches. A batch can have n items associated by items.batch_id.
I'd like to write a query item counts in two groups batched and unbatched:
items WHERE batch_id IS NOT NULL (batched)
items WHERE batch_id IS NULL (unbatched)
The result should look like this
batched
unbatched
1200000
100
Any help appreciated, thank you!
EDIT:
I got stuck with using GROUP BY which turned out to be the wrong tool for the job.
You can use COUNT with `FILTER( WHERE)
it is called conditional count
CREATE TABLE items(item_id int, batch_id int)
CREATE TABLE
INSERT INTO items VALUEs(1,NULL),(2,NULL),(3,1)
INSERT 0 3
CREATE tABLe batch (batch_id int)
CREATE TABLE
select
count(*) filter (WHERE batch_id IS NOT NULL ) as "matched"
,
count(*) filter (WHERE batch_id IS NULL ) as "unmatched"
from items
matched
unmatched
1
2
SELECT 1
fiddle
The count() function seems to be the most likely basic tool here. Given an expression, it returns a count of the number of rows where that expression evaluates to non-null. Given the argument *, it counts all rows in the group.
To the extent that there is a trick, it is getting the batched an unbatched counts in the same result row. There are at least three ways to do that:
Using subqueries:
select
(select count(batch_id) from items) as batched,
(select count(*) from items where batch_id is null) as unbatched
-- no FROM
That's pretty straightforward. Each subquery is executed and produces one column of the result. Because no FROM clause is given in the outer query, there will be exactly one result row.
Using window functions:
select
count(batch_id) over () as batched,
(count(*) over () - count(batch_id) over ()) as unbatched
from items
limit 1
That will compute the batched and unbatched results for the whole table on every result row, one per row of the items table, but then only one result row is actually returned. It is reasonable to hope (though you would definitely want to test) that postgres doesn't actually compute those counts for all the rows that are culled by the limit clause. You might, for example, compare the performance of this option with that of the previous option.
Using count() with a filter clause, as described in detail in another answer.

Create SQL Column Counting Frequency of Value in other column

I have the first three columns in SQL. I want to create the 4th column called Count which counts the number of times each unique name appears in the Name column. I want my results to appears like the dataset below, so I don't want to do a COUNT and GROUP BY.
What is the best way to achieve this?
We can try to use COUNT window function
SELECT *,COUNT(*) OVER(PARTITION BY name ORDER BY year,month) count
FROM T
ORDER BY year,month
sqlfiddle

Why does Postgres choose different data solely based on columns selected?

I'm running two different queries with two unions each inside a subquery:
So the structure is:
SELECT *
FROM (subquery_1
UNION SELECT subquery_2)
Now, if I perform the query on the left, I get this result:
However, the query on the right returns this result:
How are the results differing even though the conditions have not changed in either query, and the only difference was one of the selected columns in a subquery?
This is very counter-intuitive.
The operator UNION removes duplicate rows from the returned resultset.
Removing a column from the SELECT statement may produce duplicate rows that would not exist if the removed column was there.
Try UNION ALL instead, which will return in any case all the rows of the unioned queries.
See a simplified demo.

Is distinct function deterministic? T-sql

I have table like below. For distinct combination of user ID and Product ID SQL will select product bought from store ID 1 or 2? Is it determinictic?
My code
SELECT (DISTINCT CONCAT(UserID, ProductID)), Date, StoreID FROM X
This isn't valid syntax. You can have
select [column_list] from X
or you can have
select distinct [column_list] from X
The difference is that the first will return one row for every row in the table while the second will return one row for every unique combination of the column values in your column list.
Adding "distinct" to a statement will reliably produce the same results every time unless the underlying data changes, so in this sense, "distinct" is deterministic. However, it is not a function so the term "deterministic" doesn't really apply.
You may actually want a "group by" clause like the following (in which case you have to actually specify how you want the engine to pick values for columns not in your group):
select
concat(UserId, ProductID)
, min(Date)
, max(Store)
from
x
group by
concat(UserId, ProductID)
Results:
results

DB2: add total number of rows to each row

I'm trying to figure out where in a CTE I get extra rows, or rows deleted, so I would like to - at one point in my CTE where I suspect there is a problem - add a column which just contain one number, namely the count of rows of the whole table.
I've tried count(), but it seem to want a "group by" clause, and row_number() over() just gives me number of rows, so I'm stuck here...
You can use count() as window function, just add:
count(*) over () as total_count
to your query.