Grouping + aggregation of itab with table comprehensions - aggregate

Rather typical task but I'm stuck on doing it in a beautiful way.
For example, I need to find the last shipment for each vendor, i.e. to find delivery with the max date for the each vendor
VENDOR DELIVERY DATE
10 00055 01/01/2019
20 00070 01/19/2019
20 00088 01/20/2019
20 00120 11/22/2019
40 00150 04/01/2019
40 00200 04/10/2019
The result table to be populated
VENDOR DELIVERY DATE
10 00055 01/01/2019
20 00120 11/22/2019
40 00200 04/10/2019
I implemented this in a following way, via DESCENDING, which I find very ugly
LOOP AT itab ASSIGNING <wa> GROUP BY ( ven_no = <wa>-ven_no ) REFERENCE INTO DATA(vendor).
LOOP AT GROUP vendor ASSIGNING <ven> GROUP BY ( date = <vendor>-date ) DESCENDING.
CHECK NOT line_exists( it_vend_max[ ven_no = <ven>-ven_no ] ).
it_vend_max = VALUE #( BASE it_vend_max ( <ven> ) ).
ENDLOOP.
ENDLOOP.
Is there more elegant way to do this?
I also tried REDUCE
result = REDUCE #( vend_line = value ty_s_vend()
MEMBERS = VALUE ty_t_vend( )
FOR GROUPS <group_key> OF <wa> IN itab
GROUP BY ( key = <wa>-ven_no count = GROUP SIZE
ASCENDING
NEXT vend_line = VALUE #(
ven_no = <wa>-ven_no
date = REDUCE i( INIT max = 0
FOR m IN GROUP <group_key>
NEXT max = nmax( val1 = m-date
val2 = <wa>-date ) )
deliv_no = <wa>-deliv_no
MEMBERS = VALUE ty_s_vend( FOR m IN GROUP <group_key> ( m ) ) ).
but REDUCE selects max date from the whole table and it selects only flat structure, which is not what I want. However, in ABAP examples I saw samples where table-to-table reductions are also possible. Am I wrong?
Another thing I tried is finding uniques with WITHOUT MEMBERS but this syntax doesn't work:
it_vend_max = VALUE ty_t_vend( FOR GROUPS value OF <line> IN itab
GROUP BY ( <line>-ven_no <line>-ship_no )
WITHOUT MEMBERS ( lifnr = value
date = nmax( val1 = <line>-date
val2 = value-date ) ) ).
Any suggestion of what is wrong here or own elegant solution is appreciated.

If not too complex, I think it's best to use one construction expression, which shows that the goal of the expression is to initialize one variable and nothing else.
The best I could do to be the most performing and the shortest possible, but I can't make it elegant:
TYPES ty_ref_s_vend TYPE REF TO ty_s_vend.
result = VALUE ty_t_vend(
FOR GROUPS <group_key> OF <wa> IN itab
GROUP BY ( ven_no = <wa>-ven_no ) ASCENDING
LET max2 = REDUCE #(
INIT max TYPE ty_ref_s_vend
FOR <m> IN GROUP <group_key>
NEXT max = COND #( WHEN max IS NOT BOUND
OR <m>-date > max->*-date
THEN REF #( <m> ) ELSE max ) )
IN ( max2->* ) ).
As you can see I use a data reference (aux_ref_s_vend2) for a better performance, to point to the line which has the most recent date. It's theoretically faster than copying the bytes of the whole line, but it's less readable. If you don't have a huge table, there won't be a big difference between using an auxiliary data reference or an auxiliary data object.
PS: I could not test it because the question does not provide a MCVE.
Here is another solution if you really want to use REDUCE in the primary constructor expression (but it's not needed):
result = REDUCE ty_t_vend(
INIT vend_lines TYPE ty_t_vend
FOR GROUPS <group_key> OF <wa> IN itab
GROUP BY ( ven_no = <wa>-ven_no ) ASCENDING
NEXT vend_lines = VALUE #(
LET max2 = REDUCE ty_ref_s_vend(
INIT max TYPE ty_ref_s_vend
FOR <m> IN GROUP <group_key>
NEXT max = COND #( WHEN max IS NOT BOUND
OR <m>-date > max->*-date
THEN REF #( <m> ) ELSE max ) )
IN BASE vend_lines
( max2->* ) ) ).

what do you mean by elegant solution? Using GROUP or REDUCE with the "new" abap syntax is not making it elegant in any way, at least for me...
For me, coding that is easily understandable for everyone is elegant:
SORT itab BY vendor date DESCENDING.
DELETE ADJACENT DUPLICATES from itab COMPARING vendor.
Or if the example is more complex, a simple LOOP AT with IF or AT in it APPENDING aggregated lines to a new itab, will also solve it. Example here.

Related

Using Group By and Max function together DAX

I have a table like below:
and I want to group by the date and name and then order by the MAX of rate. I use such an Expression:
NewTable =
CALCULATETABLE (
Table1,
GROUPBY ( Table1, Table1[Day], Table1[Name], "maxrate", MAX ( Table1[Rate] ) ))
But I receive an error. Can anyone explain how max and group by can be used together in DAX?
Just use SUMMARIZE function instead of GROUPBY:
New Table = SUMMARIZE (Table1, Table1[Day], Table1[Name], "maxrate', MAX(Table1[Rate]))
GROUPBY requires an iterator (such as MAXX). For example, let's say your table has rate and quantity, and your want to calculate max amount (rate * quantity). Then you should use GROUPBY:
New Table =
GROUPBY (
Table1,
Table1[Day],
Table1[Name],
"Max Amount", MAXX ( CURRENTGROUP (), Table1[Rate] * Table1[Quantity] )
)
Here, you first group table1 by day and name, and then iterate current group, to find max amount.
GROUPBY is very handy in some complicated cases, but your situation seems straightforward.

Filter portal for most recently created record by group

I have a portal on my "Clients" table. The related table contains the results of surveys that are updated over time. For each combination of client and category (a field in the related table), I only want the portal to display the most recently collected row.
Here is a link to a trivial example that illustrates the issue I'm trying to address. I have two tables in this example (Related on ClientID):
Clients
Table 1 Get Summary Method
The Table 1 Get Summary Method table looks like this:
Where:
MaxDate is a summary field = Maximum of Date
MaxDateGroup is a calculated field = GetSummary ( MaxDate ;
ClientIDCategory )
ShowInPortal = If ( Date = MaxDateGroup ; 1 ; 0 )
The table is sorted on ClientIDCategory
Issue 1 that I'm stumped on: .
ShowInPortal should equal 1 in row 3 (PKTable01 = 5), row 4 (PKTable01 = 6), and row 6 (PKTable01 = 4) in the table above. I'm not sure why FM is interpreting 1Red and 1Blue as the same category, or perhaps I'm just misunderstanding what the GetSummary function does.
The Clients table looks like this:
Where:
The portal records are sorted on ClientIDCategory
Issue 2 that I'm stumped on:
I only want rows with a ShowInPortal value equal to 1 should appear in the portal. I tried creating a portal filter with the following formula: Table 1 Get Summary Method::ShowInPortal = 1. However, using that filter removes all row from the portal.
Any help is greatly appreciated.
One solution is to use ExecuteSQL to grab the Max Date. This removes the need for Summary functions and sorts, and works as expected. Propose to return it as number to avoid any issues with date formats.
GetAsTimestamp (
ExecuteSQL (
"SELECT DISTINCT COALESCE(MaxDate,'')
FROM Survey
WHERE ClientIDCategory = ? "
; "" ; "";ClientIDCategory )
)
Also, you need to change the ShowInPortal field to an unstored calc field with:
If ( GetAsNumber(Date) = MaxDateGroupSQL ; 1 ; 0 )
Then filter the portal on this field.
I can send you the sample file if you want.

Compare counts in PostgreSQL

I want to compare two results of queries of the same table, by checking theresulting row count, but Postgres doesn't support column aliases in the where clause.
select id from article where version=1308
and exists(
select count(ident) as count1 from artprice AS p1
where p1.valid_to<to_timestamp(1586642400000) or p1.valid_from>to_timestamp(1672441199000)
and p1.article=article.id
and p1.count1=(select count(ident) from artprice where article=article.id)
)
I also cannot use aggregate functions in the where clause, so
select id from article where version=1308
and exists(
select count(ident) as count1 from artprice AS p1
where p1.valid_to<to_timestamp(1586642400000) or p1.valid_from>to_timestamp(1672441199000)
and p1.article=article.id
and p1.count(ident)=(select count(ident) from artprice where article=article.id)
)
also doesn't work. Any ideas?
EDIT:
What I want to get are articles where every article price is outside of a valid range defined by validFrom andValidTo.
I now changed the statement by negating the positive conditions:
Select distinct article.id from Article article, ArtPrice price
where
(
(article.version=?)
and
(
(
(
(
(not(price.valid_from>=?)) or (not(price.valid_to<=?))
)
and
(
(not(price.valid_from<=?)) or (not(price.valid_to>=?))
)
)
and
(
(not(price.valid_to>=?)) or (not(price.valid_to<=?))
)
)
and
(
(not(price.valid_from>=?)) or (not(price.valid_from<=?))
)
)
) and article.id=price.article
Probably not the very elegant solution, but it works.
Aggregates are not allowed in WHERE clause, but there's HAVING clause for them.
EDIT: What I want to get are articles where every article price is outside of a valid range defined by validFrom andValidTo.
I think that bool_or() would be a good fit here when combined with range operations:
SELECT article.id
FROM Article AS article
JOIN ArtPrice AS price ON price.article = article.id
WHERE article.version = 1308
GROUP BY article.id
HAVING NOT bool_or(tsrange(price.valid_from, price.valid_to)
&& tsrange(to_timestamp(1586642400000),
to_timestamp(1672441199000)))
This reads as "...those having not any price tsrange overlap with given tsrange".
Postgresql also supports the SQL OVERLAPS operator:
(price.valid_from, price.valid_to) OVERLAPS (to_timestamp(1586642400000),
to_timestamp(1672441199000))
As a note, it operates on half-open intervals start <= time < end.

Greatest N per group in Open SQL

Selecting the rows from a table by (partial) key with the maximum value in a particular column is a common task in SQL. This question has some excellent answers that cover a variety of approaches to it. Unfortunately I'm struggling to replicate this in my ABAP program.
None of the commonly used approaches seem to be supported:
Joining on a subquery is not supported in syntax: SELECT * FROM X as x INNER JOIN ( SELECT ... ) AS y
Using IN for a composite key is not supported in syntax as far as I know: SELECT * FROM X WHERE (key1, key2) IN ( SELECT key1 key2 FROM ... )
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons: SELECT * FROM X AS x LEFT JOIN X as xmax ON x-key1 = xmax-key1 AND x-key2 < xmax-key2 WHERE xmax-key IS INITIAL
After trying each of these solutions in turn only to discover that ABAP doesn't seem to support them and being unable to find any equivalents I'm starting to think that I'll have no choice but to dump the data of the subquery to an itab.
What is the best practice for this common programming requirement in ABAP development?
First of all, specific requirement, would give you a better answer. As it happens I bumped into this question when working on a program, that uses 3 distinct methods of pseudo-grouping, (while looking for alternatives) and ALL 3 can be used to answer your question, depending on what exactly you need to do. I'm sure there are more ways to do it.
For instance, you can pull maximum values within a group by simply selecting max( your_field ) and grouping by some fields, if that's all you need.
select bname, nation, max( date_from ) from adrp group by bname, nation. "selects highest "from" date for each bname
If you need to use that max value as a filter condition within a query, you can do it by performing pseudo-grouping using sub-query and max within sub-query like this (notice how I move out the BNAME check into sub query, which means I don't have to check both fields using in (subquery) addition):
select ... from adrp as b_adrp "Pulls the latest person info for a user (some conditions are missing, but this is a part of an actual query)
where b_adrp~date_from in (
select max( date_from ) "Highest date_from where both dates are valid
from adrp where persnumber = b_adrp~persnumber and nation = b_adrp~nation and date_from <= #sy-datum )
The query above allows you to select selects all user info from base query and (where the first one only allows to take aggregated and grouped data).
Finally, If you need to check based on composite key and compare it to multiple agregate function results, the implementation will heavily depend on specifics of your requirement (and since your question has none, I'll provide a generic one). Easiest option is to use exists / not exists instead of in (subquery), in exact same way and form the subquery to check for existance of specific key or condition rather than pull a list ( you can nest subqueries if you have to ):
select * from bkpf where exists ( select 1 from bkpf as b where belnr = bkpf~belnr and gjahr = bkpf~gjahr group by belnr, gjahr having max( budat ) = bkpf~budat ) "Took an available example, that I had in testing program.
All 3 queries will get you max value of a column within a group and in fact, all 3 can use joins to achieve identical results.
please find my answers below your questions.
Joining on a subquery is not supported in syntax: SELECT * FROM X as x INNER JOIN ( SELECT ... ) AS y
Putting the subquery in your where condition should do the work SELECT * FROM X AS x INNER JOIN Y AS y ON x~a = y~b WHERE ( SELECT * FROM y WHERE ... )
Using IN for a composite key is not supported in syntax as far as I know: SELECT * FROM X WHERE (key1, key2) IN ( SELECT key1 key2 FROM ... )
You have to split your WHERE clause: SELECT * FROM X WHERE key1 IN ( SELECT key1 FROM y ) AND key2 IN ( SELECT key2 FROM y )
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons.
Yes, thats right at the moment.
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons:
SELECT * FROM X AS x LEFT JOIN X as xmax ON x-key1 = xmax-key1 AND x-key2 < xmax-key2 WHERE xmax-key IS INITIAL
This is not true. This SELECT is perfectly valid:
SELECT b1~budat
INTO TABLE lt_bkpf
FROM bkpf AS b1
LEFT JOIN bkpf AS b2
ON b2~belnr < b1~belnr
WHERE b1~bukrs <> ''.
And was valid at least since 7.40 SP08, since July 2013, so at the time you asked this question it was valid as well.

MySQL select rows where date not between date

I have a booking system in which I need to select any available room from the database. The basic setup is:
table: room
columns: id, maxGuests
table: roombooking
columns: id, startDate, endDate
table: roombooking_room:
columns: id, room_id, roombooking_id
I need to select rooms that can fit the requested guests in, or select two (or more) rooms to fit the guests in (as defined by maxGuests, obviously using the lowest/closet maxGuests first)
I could loop through my date range and use this sql:
SELECT `id`
FROM `room`
WHERE `id` NOT IN
(
SELECT `roombooking_room`.`room_id`
FROM `roombooking_room`, `roombooking`
WHERE `roombooking`.`confirmed` =1
AND DATE(%s) BETWEEN `roombooking`.`startDate` AND `roombooking`.`endDate`
)
AND `room`.`maxGuests`>=%d
Where %$1 is the looped date and %2d is the number of guests to be booked in. But this will just return false if there are more guests than any room can take, and there must be a quicker way of doing this rather than looping with php and running the query?
This is similar to part of the sql I was thinking of: Getting Dates between a range of dates but with Mysql
Solution, based on ircmaxwell's answer:
$query = sprintf(
"SELECT `id`, `maxGuests`
FROM `room`
WHERE `id` NOT IN
(
SELECT `roombooking_room`.`room_id`
FROM `roombooking_room`
JOIN `roombooking` ON `roombooking_room`.`roombooking_id` = `roombooking`.`id`
WHERE `roombooking`.`confirmed` =1
AND (`roomBooking`.`startDate` > DATE(%s) OR `roomBooking`.`endDate` < DATE(%s))
)
AND `maxGuests` <= %d ORDER BY `maxGuests` DESC",
$endDate->toString('yyyy-MM-dd'), $startDate->toString('yyyy-MM-dd'), $noGuests);
$result = $db->query($query);
$result = $result->fetchAll();
$rooms = array();
$guests = 0;
foreach($result as $res) {
if($guests >= $noGuests) break;
$guests += (int)$res['maxGuests'];
$rooms[] = $res['id'];
}
Assuming that you are interested to place #Guests from #StartDate to #EndDate
SELECT DISTINCT r.id,
FROM room r
LEFT JOIN roombooking_room rbr ON r.id = rbr.room_id
LEFT JOIN roombooking ON rbr.roombooking_id = rb.id
WHERE COALESCE(#StartDate NOT BETWEEN rb.startDate AND rb.endDate, TRUE)
AND COALESCE(#EndDate NOT BETWEEN rb.startDate AND rb.endDate, TRUE)
AND #Guests < r.maxGuests
should give you a list of all rooms that are free and can accommodate given number of guests for the given period.
NOTES
This query works only for single rooms, if you want to look at multiple rooms you will need to apply the same criteria to a combination of rooms. For this you would need recursive queries or some helper tables.
Also, COALESCE is there to take care of NULLs - if a room is not booked at all it would not have any records with dates to compare to, so it would not return completely free rooms. Date between date1 and date2 will return NULL if either date1 or date2 is null and coalesce will turn it to true (alternative is to do a UNION of completely free rooms; which might be faster).
With multiple rooms things get really interesting.
Is that scenario big part of your problem? And which database are you using i.e. do you have access to recursive queries?
EDIT
As I stated multiple times before, your way of looking for a solution (greedy algorithm that looks at the largest free rooms first) is not the optimal if you want to get the best fit between required number of guests and rooms.
So, if you replace your foreach with
$bestCapacity = 0;
$bestSolution = array();
for ($i = 1; $i <= pow(2,sizeof($result))-1; $i++) {
$solutionIdx = $i;
$solutionGuests = 0;
$solution = array();
$j = 0;
while ($solutionIdx > 0) :
if ($solutionIdx % 2 == 1) {
$solution[] = $result[$j]['id'];
$solutionGuests += $result[$j]['maxGuests'];
}
$solutionIdx = intval($solutionIdx/2);
$j++;
endwhile;
if (($solutionGuests <= $bestCapacity || $bestCapacity == 0) && $solutionGuests >= $noGuests) {
$bestCapacity = $solutionGuests;
$bestSolution = $solution;
}
}
print_r($bestSolution);
print_r($bestCapacity);
Will go through all possible combinations and find the solution that wastes the least number of spaces.
Ok, first off, the inner query you're using is a cartesian join, and will be VERY expensive. You need to specify join criteria (roombooking_room.booking_id = roombooking.id for example).
Secondly, assuming that you have a range of dates, what can we say about that? Well, let's call the start of your range rangeStartDate and rangeEndDate.
Now, what can we say about any other range of dates that does not have any form of overlap with this range? Well, the endDate must not be between be either the rangeStartDate and the rangeEndDate. Same with the startDate. And the rangeStartDate (and rangeEndDate, but we don't need to check it) cannot be between startDate and endDate...
So, assuming %1$s is rangeStartDate and %2$s is rangeEndDate, a comprehensive where clause might be:
WHERE `roomBooking`.`startDate` NOT BETWEEN %1$s AND %2s
AND `roomBooking`.`endDate` NOT BETWEEN %1$s AND %2$$s
AND %1s NOT BETWEEN `roomBooking`.`startDate` AND `roomBooking`.`endDate`
But, there's a simpler way of saying that. The only way for a range to be outside of another is for the start_date to be after the end_date, or the end_date be before the start_id
So, assuming %1$s is rangeStartDate and %2$s is rangeEndDate, another comprehensive where clause might be:
WHERE `roomBooking`.`startDate` > %2$s
OR `roomBooking`.`endDate` < %1$s
So, that brings your overall query to:
SELECT `id`
FROM `room`
WHERE `id` NOT IN
(
SELECT `roombooking_room`.`room_id`
FROM `roombooking_room`
JOIN `roombooking` ON `roombooking_room`.`roombooking_id` = `roombooking`.`id`
WHERE `roombooking`.`confirmed` =1
AND (`roomBooking`.`startDate` > %2$s
OR `roomBooking`.`endDate` < %1$s)
)
AND `room`.`maxGuests`>=%d
There are other ways of doing this as well, so keep looking...
SELECT rooms.id
FROM rooms LEFT JOIN bookings
ON booking.room_id = rooms.id
WHERE <booking overlaps date range of interest> AND <wherever else>
GROUP BY rooms.id
HAVING booking.id IS NULL
I might be miss remembering how left join works so you might need to use a slightly different condition on the having, maybe a count or a sum.
At worst, with suitable indexes, that should scan half the bookings.