Using Group By and Max function together DAX - group-by

I have a table like below:
and I want to group by the date and name and then order by the MAX of rate. I use such an Expression:
NewTable =
CALCULATETABLE (
Table1,
GROUPBY ( Table1, Table1[Day], Table1[Name], "maxrate", MAX ( Table1[Rate] ) ))
But I receive an error. Can anyone explain how max and group by can be used together in DAX?

Just use SUMMARIZE function instead of GROUPBY:
New Table = SUMMARIZE (Table1, Table1[Day], Table1[Name], "maxrate', MAX(Table1[Rate]))
GROUPBY requires an iterator (such as MAXX). For example, let's say your table has rate and quantity, and your want to calculate max amount (rate * quantity). Then you should use GROUPBY:
New Table =
GROUPBY (
Table1,
Table1[Day],
Table1[Name],
"Max Amount", MAXX ( CURRENTGROUP (), Table1[Rate] * Table1[Quantity] )
)
Here, you first group table1 by day and name, and then iterate current group, to find max amount.
GROUPBY is very handy in some complicated cases, but your situation seems straightforward.

Related

PostgreSQL using different index for same query

I have a SQL query which is using inner join on two tables and filtering data based on several params. Going by the query plan, for different values of query params (like different date range), Postgres is using different index.
I am aware of the fact that Postgres determines if the index has to be used or not, depending on the number or rows in the result set. But why does Postgres choose to use different index for same query. The query time varies by a factor of 10, between the two cases. How can I optimise the query? As Postgres does not allows the user to define the index to be used in a query.
Edit:
explain (analyze, buffers, verbose) SELECT COUNT(*) FROM "bookings" INNER JOIN "hotels" ON "hotels"."id" = "bookings"."hotel_id" WHERE "bookings"."hotel_id" = 37016 AND (bookings.status in (0,1,2,3,4,5,6,7,9,10,11,12)) AND (bookings.source in (0,1,2,3,4,5,6,7,8,9,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70) or bookings.status in (0,1,2,3,4,5,6,7,8,9,10,11,13)) AND (
bookings.source in (4,66,65)
OR
date(timezone('+05:30',bookings.created_at))>checkin
OR
(
( date(timezone('+05:30',bookings.created_at))=checkin
and
extract (epoch from COALESCE(cancellation_time,NOW())-bookings.created_at)>600
)
OR
( date(timezone('+05:30',bookings.created_at))<checkin
and
extract (epoch from COALESCE(cancellation_time,NOW())-bookings.created_at)>600
and
(
extract (epoch from ((bookings.checkin||' '||hotels.checkin_time)::timestamp -COALESCE(cancellation_time,bookings.checkin))) < extract(epoch from '16 hours'::interval)
OR
(DATE(bookings.checkout)-DATE(bookings.checkin))*(COALESCE(bookings.oyo_rooms,0)+COALESCE(bookings.owner_rooms,0)) > 3
)
)
)
) AND (bookings.checkin >= '2018-11-21') AND (bookings.checkin <= '2019-05-19') AND "bookings"."hotel_id" = '37016' AND "bookings"."status" IN (0, 1, 2, 3, 12);
QueryPlan : https://explain.depesz.com/s/SPeb
explain (analyze, buffers, verbose) SELECT COUNT(*) FROM "bookings" INNER JOIN "hotels" ON "hotels"."id" = 37016 WHERE "bookings"."hotel_id" = 37016 AND (bookings.status in (0,1,2,3,4,5,6,7,9,10,11,12)) AND (bookings.source in (0,1,2,3,4,5,6,7,8,9,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70) or bookings.status in (0,1,2,3,4,5,6,7,8,9,10,11,13)) AND (
bookings.source in (4,66,65)
OR
date(timezone('+05:30',bookings.created_at))>checkin
OR
(
( date(timezone('+05:30',bookings.created_at))=checkin
and
extract (epoch from COALESCE(cancellation_time,now())-bookings.created_at)>600
)
OR
( date(timezone('+05:30',bookings.created_at))<checkin
and
extract (epoch from COALESCE(cancellation_time,now())-bookings.created_at)>600
and
(extract (epoch from ((bookings.checkin||' '||hotels.checkin_time)::timestamp -COALESCE(cancellation_time,bookings.checkin))) < extract(epoch from '16 hours'::interval)
OR
(DATE(bookings.checkout)-DATE(bookings.checkin))*(COALESCE(bookings.oyo_rooms,0)+COALESCE(bookings.owner_rooms,0)) > 3
)
)
)
) AND (bookings.checkin >= '2018-11-22') AND (bookings.checkin <= '2019-05-19') AND "bookings"."hotel_id" = '37016' AND "bookings"."status" IN (0,1,2,3,4,12);
QueryPlan: https://explain.depesz.com/s/DWD
Finally found the solution to this problem. I am querying on the basis of more than 10 possible values of a column (status in this case). If I break this query into multiple sub-queries each querying upon only 1 status value and aggregate the result using union all, then the query plan executed uses optimized index for each subquery.
Results: The query time decreased by 10 times by this change.
Possible explanation for this behaviour, the query planner fetches less number of rows for each subquery and uses the optimized index in this case. I am not sure about if this is the correct explanation.

How to use a Count window function to get a percentage?

Say I have the following table and values:
CREATE TABLE test (
context text
);
INSERT INTO test VALUES ('idea'), ('vote'), ('vote'), ('query'), ('vote');
I want to be able to find out what percentage of the rows in the table are equal to vote without using a subquery in the select statement (which correct me if I'm off on this, will re-calculate for every row in the result set).
I'm trying something like this:
SELECT context, count(*) / count(*) OVER ()::DOUBLE PRECISION AS pct
FROM test
GROUP BY context
but having no success. I would hope to see something like:
context_name pct_of_total for each distinct context type.
The enumerator is count partitioned by context and the denominator is count without partition
select distinct
context
,round( 100.0* count(1) over (partition by context) / count(1) over () ,2) as perc
from test

Compare counts in PostgreSQL

I want to compare two results of queries of the same table, by checking theresulting row count, but Postgres doesn't support column aliases in the where clause.
select id from article where version=1308
and exists(
select count(ident) as count1 from artprice AS p1
where p1.valid_to<to_timestamp(1586642400000) or p1.valid_from>to_timestamp(1672441199000)
and p1.article=article.id
and p1.count1=(select count(ident) from artprice where article=article.id)
)
I also cannot use aggregate functions in the where clause, so
select id from article where version=1308
and exists(
select count(ident) as count1 from artprice AS p1
where p1.valid_to<to_timestamp(1586642400000) or p1.valid_from>to_timestamp(1672441199000)
and p1.article=article.id
and p1.count(ident)=(select count(ident) from artprice where article=article.id)
)
also doesn't work. Any ideas?
EDIT:
What I want to get are articles where every article price is outside of a valid range defined by validFrom andValidTo.
I now changed the statement by negating the positive conditions:
Select distinct article.id from Article article, ArtPrice price
where
(
(article.version=?)
and
(
(
(
(
(not(price.valid_from>=?)) or (not(price.valid_to<=?))
)
and
(
(not(price.valid_from<=?)) or (not(price.valid_to>=?))
)
)
and
(
(not(price.valid_to>=?)) or (not(price.valid_to<=?))
)
)
and
(
(not(price.valid_from>=?)) or (not(price.valid_from<=?))
)
)
) and article.id=price.article
Probably not the very elegant solution, but it works.
Aggregates are not allowed in WHERE clause, but there's HAVING clause for them.
EDIT: What I want to get are articles where every article price is outside of a valid range defined by validFrom andValidTo.
I think that bool_or() would be a good fit here when combined with range operations:
SELECT article.id
FROM Article AS article
JOIN ArtPrice AS price ON price.article = article.id
WHERE article.version = 1308
GROUP BY article.id
HAVING NOT bool_or(tsrange(price.valid_from, price.valid_to)
&& tsrange(to_timestamp(1586642400000),
to_timestamp(1672441199000)))
This reads as "...those having not any price tsrange overlap with given tsrange".
Postgresql also supports the SQL OVERLAPS operator:
(price.valid_from, price.valid_to) OVERLAPS (to_timestamp(1586642400000),
to_timestamp(1672441199000))
As a note, it operates on half-open intervals start <= time < end.

how to retrieve the column values which are aggregated(like count) using groupby column

I want to display the contents of the aggregated columns that is part of group by sql statement.
Example:
SELECT Shippers.ShipperName,COUNT(Orders.OrderID) AS NumberOfOrders FROM Orders
WHERE Orders.ShipperID=Shippers.ShipperID
GROUP BY ShipperName
In the above example the output gives me count as one of the result, whereas i need the aggregated orders.orderID actual values even. So say if one result count shows me 2. I need to know what are those two values which have been grouped. This result should be as another column in the same table.
try this with GROUP_CONCAT
SELECT Shippers.ShipperName,COUNT(Orders.OrderID) AS NumberOfOrders , array_agg(Orders.OrderID) as which_are_those FROM Orders
WHERE Orders.ShipperID=Shippers.ShipperID
GROUP BY ShipperName
array_agg returns an array, but you can CAST that to text and edit as needed (see clarifications, below).
Prior to version 8.4, you have to define it yourself prior to use:
CREATE AGGREGATE array_agg (anyelement)
(
sfunc = array_append,
stype = anyarray,
initcond = '{}'
);
or simply this:
SELECT Shippers.ShipperName,COUNT(Orders.OrderID) AS NumberOfOrders , string_agg(Orders.OrderID, ',') as which_are_those FROM Orders
WHERE Orders.ShipperID=Shippers.ShipperID
GROUP BY ShipperName

need to copy all rows with C_PROV_TYPE ='014' and C_SPECILTY = '300' and insert back 3 rows with same data + max sequence number + 1 i.e =4,5,6

need 3 rows for each one of the two valid rows displayed below output like below:
Primary key is C_PROCEDURE + C_PROV_TYPE + SPEC_SEQ_NO!
output shall be like bELOW
You could try something like this:
INSERT INTO YourTable (
C_PROCEDURE,
C_PROV_TYPE,
I_PT_SPEC_SEQ_NO,
C_SPECIALTY
)
SELECT
s.C_PROCEDURE,
s.C_PROV_TYPE,
s.MaxSeq + ROW_NUMBER() OVER (
PARTITION BY s.C_PROCEDURE, s.C_PROV_TYPE
ORDER BY v.rn, s.I_PT_SPEC_SEQ_NO),
s.C_SPECIALTY + v.rn
FROM (
SELECT
*,
MAX(I_PT_SPEC_SEQ_NO) OVER (
PARTITION BY C_PROCEDURE, C_PROV_TYPE
) AS MaxSeq
FROM YourTable
) s
CROSS JOIN (
VALUES (1), (2), (3)
) v (rn)
WHERE s.C_PROV_TYPE = '014'
AND s.C_SPECIALTY = '300'
;
Basically, the subquery returns all the YourTable rows supplied with the maximum values of I_PT_SPEC_SEQ_NO for every partition of (C_PROCEDURE, C_PROV_TYPE) using the windowing MAX() function (MAX(...) OVER (...)).
The resulting set of that subquery is then cross-joined to an inline 3-row table (which produces three copies of every row returned) and filtered by the specified values of C_PROV_TYPE and C_SPECIALTY.
New data rows pull C_PROCEDURE and C_PROV_TYPE directly from the subquery. The new C_SPECIALTY values are produced using those from the subquery and the rn values of the inline table. The new sequence numbers are generated with the help of the ROW_NUMBER() function and the maximum sequence numbers returned by the subquery.
As I didn't have access to a working installation of DB2, I was testing my script in SQL Server 2008, trying to stick to features that I understood DB2 supported as well as SQL Server. This SQL Fiddle demo also uses a SQL Server 2008 instance to demonstrate how the query works.