Postgres grouping sets - Show count zero or named groups - postgresql

I have the following query:
select count(*)
from "my_table"
where field1 = true and field2 = 'delivered'
group by grouping sets((field3), (field4), (field5))
having field3 = true or field4 = true or field5
For each group in the grouping set, I want to get the count of the true values. The problem with this query however is that it has no labels attach to it. In case there are no results for a group, the group is excluded in the end result. Is it possible to include groups with count 0 or create a set of labels so that it becomes clear to which group the count corresponds in the result?
Current result:
count
_____
1
2
Where field5 doesn't have any results so it isn't included.
Desired result:
count
-----
1
2
0
Or:
count_field3 | count_field4 | count_field5
------------------------------------------
1 | 2 | 0

Related

How to compare all fields in several rows in one table with result true or false (PostgreSQL)

I have such table (for example):
Field1
Field2
Field3
Field4
.....
1
a
c
c
1
a
x
c
1
a
c
c
2
a
y
j
2
b
y
k
2
b
y
l
I need to select by one field by one value and compare all fields in selected rows, like SELECT * WHERE Filed1=1.....COMPARE
I would like to have a result like:
Field1
Field2
Field3
Field4
.....
true
true
false
true
This should work for fixed columns and if there are no NULL values:
SELECT
COUNT(DISTINCT t.col1) = 1,
COUNT(DISTINCT t.col2) = 1,
COUNT(DISTINCT t.col3) = 1,
...
FROM mytable t
WHERE t.filter_column = 'some_value'
GROUP BY col1;
If you have some nullable columns, perhaps you could give it a try with something like this instead of the COUNT(DISTINCT t.<colname>) = 1:
BOOL_AND(NOT EXISTS(
SELECT 1
FROM mytable t2
WHERE t2.filter_column = 'some_value'
AND t2.<colname> IS DISTINCT FROM t.<colname>
))
If you do not have fixed columns, you should try to build up a dynamic query by a function taking as parameters the tablename, the name of the filter-column and the value for the filter.
Another remark: If you remove the filter (the condition t.filter_column = 'some_value') and add another output column as just t.filter_column, you should be able to recieve the result of this query for all distinct values in your filter-column.

MAX() usage in GROUP BY with non-numeric column

I have a table similar to the following
UserId | ActionType
--------------------
1 | Create
2 | Read
1 | Edit
2 | Create
3 | Read
I want to find the "highest" action that a user has done, with the following hierarchy Create > Edit > Read. Running the desired query should return
UserId | ActionType
-------------------
1 | Create
2 | Create
3 | Read
Is there a way to leverage MAX() in HIVE to do this? My structure looks like the following very basic query but I'm unsure how to compute the above ActionType column.
SELECT UserId, ??? FROM UserActions GROUP BY UserId;
I think possible solutions are CASE statements in the GROUP BY or converting the values into numeric values, such as (Read => 0, Edit => 1, Create => 2) and then doing a GROUP BY, but I am hoping there is a more elegant solution.
Thanks!
i don't know if hiveql supports sub queries, but this is the idea if it was on SQL :
SELECT
a.UserId,
a.ActionType
From
a.UserActions
WHERE
a.ActionType = (
SELECT
b.ActionType
From
(
SELECT
MAX(COUNT(*)),
c.ActionType
FROM
UserActions as c
WHERE
c.UserId = a.UserId
GROUP BY
c.ActionType
) as b
)
Below would be query in hive.
select
t1.userId, t1.actionType,
min(case when t1.actionType='Create' then 1 else 100
when t1.actionType='Edit' then 2 else 100
when t1.actionType='Read' then 3 else 100 end) as GroupBy
from mytable t1 group by t1.userId, t1.actionType

Counting all entries with KSQL

Is it possible to use KSQL to not only count entries of a specific column via GROUP BY but instead get an aggregate over all the entries that stream through the application?
I'm searching for something like this:
| Count all | Count id1 | count id2 |
| ---245----|----150----|----95-----|
Or more like this in KSQL:
[some timestamp] | Count all | 245
[some timestamp] | Count id1 | 150
[some timestamp] | Count id2 | 95
.
.
.
Thank you
- Tim
You cannot have both counts for the all and count for each key in the same query. You can have two queries here, one for counting each value in the given column and another for counting all values in the given column.
Let's assume you have a stream with two columns, col1 and col2.
To count each value in col1 with infinite window size you can use the following query:
SELECT col1, count(*) FROM mystream1 GROUP BY col1;
To count all the rows you need to write two queries since KSQL always needs GROUP BY clause for aggregation. First you create a new column with constant value and then you can count the values in new column and since it is a constant, the count will represent the count of all rows. Here is an example:
CREATE STREAM mystream2 AS SELECT 1 AS col3 FROM mystream1;
SELECT col3, count(*) FROM mystream2 GROUP BY col3;
This works too to get total rows count for a table:
ksql> SELECT COUNT(*) FROM `mytable` GROUP BY 1 EMIT CHANGES;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|KSQL_COL_0 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|2298
you can do a extended describe on the stream or table to see the total messages
ksql> describe extended <stream or table name>
sample output
Local runtime statistics
------------------------
messages-per-sec: 0 total-messages: 2415888 last-message: 2019-12-06T02:29:43.005Z

Return names of users who are never associated with a value in a certain column in any row

Suppose I want to look at only names of people who never have a row corresponding to them which contains a certain value in another column. For example, in the following table...
name | value
-------+-------
joe | 0
joe | 3
joe | 2
joe | 3
bill | 0
bill | 1
bill | 2
... I'd like to say something like, "give me all of the users who do not ever have a value '1' in the value column." In this case, it would return just "joe".
In the real-life example, the table is gigantic, so it wasn't time-effective to create a subquery and do a where name not in (select * from table_name where value = 1). Is there a more efficient way to do something like this?
select name
from t
group by name
having not bool_or(value = 1)
http://www.postgresql.org/docs/current/static/functions-aggregate.html
Group by the name and take only those having zero times value = 1
select name
from your_table
group by name
having sum(case when value = 1 then 1 else 0 end) = 0
Following Query retrieves user who does not ever has value 1
SELECT
distinct(name) from test
where name not in (select name from test where value=1)
Following Query retrieves all rows that do not contain value 1
SELECT
* from test
where name not in (select name from test where value=1)

Update Count column in Postgresql

I have a single table laid out as such:
id | name | count
1 | John |
2 | Jim |
3 | John |
4 | Tim |
I need to fill out the count column such that the result is the number of times the specific name shows up in the column name.
The result should be:
id | name | count
1 | John | 2
2 | Jim | 1
3 | John | 2
4 | Tim | 1
I can get the count of occurrences of unique names easily using:
SELECT COUNT(name)
FROM table
GROUP BY name
But that doesn't fit into an UPDATE statement due to it returning multiple rows.
I can also get it narrowed down to a single row by doing this:
SELECT COUNT(name)
FROM table
WHERE name = 'John'
GROUP BY name
But that doesn't allow me to fill out the entire column, just the 'John' rows.
you can do that with a common table expression:
with counted as (
select name, count(*) as name_count
from the_table
group by name
)
update the_table
set "count" = c.name_count
from counted c
where c.name = the_table.name;
Another (slower) option would be to use a co-related sub-query:
update the_table
set "count" = (select count(*)
from the_table t2
where t2.name = the_table.name);
But in general it is a bad idea to store values that can easily be calculated on the fly:
select id,
name,
count(*) over (partition by name) as name_count
from the_table;
Another method : Using a derived table
UPDATE tb
SET count = t.count
FROM (
SELECT count(NAME)
,NAME
FROM tb
GROUP BY 2
) t
WHERE t.NAME = tb.NAME