Make math operations on a grouped table - group-by

My problem is not in a real programming Language.
I have an exercise in ABAP Language but is not very important the language.
Anyway, I have a table:
I need to make the total cost of the position(after the select obviously).
Then, the table will be grouped by two fields (MATNR and BUKRS), so I need to know for each Group the total cost MAX, the total cost MIN and the total cost AVERAGE of the positions.
However I need a simple algorithm to solve this problem (pseudo-code).
I hope I was clear.

For table aggregations I find the AT functions within a LOOP very handy.
Sort your fields according the dimensions you need and sort the values within ascending or descending.
The order of the fields is very important here, because the AT looks for changes in the specified field and all fields left of it in the same row. So you handle group for group and append the result of the group aggregation to your result table.
LOOP AT lt_itab ASSIGNING <ls_itab>.
AT NEW bukrs. "at first entry of combination matnr, bukrs
ls_agg-matnr = <ls_itab>-matnr.
ls_agg-bukrs = <ls_itab>-bukrs.
ENDAT.
TRY.
ADD <ls_itab>-amount TO lf_sum.
CATCH cx_sy_arithmetic_overflow.
lf_sum = 9999.
ENDTRY.
lf_count = lf_count + 1.
IF <ls_itab>-amount > lf_max.
lf_max = <ls_itab>-amount.
ENDIF.
AT END OF bukrs. "after last entry of combination matnr,bukrs
ls_agg-avg = lf_sum / lf_count.
ls_agg-max = lf_max.
APPEND ls_agg TO lt_agged.
CLEAR: ls_agg, lf_sum, lf_count, lf_max.
ENDAT.
ENDLOOP.

Related

DAX: Distinct and then aggregate twice

I'm trying to create a Measure in Power BI using DAX that achieves the below.
The data set has four columns, Name, Month, Country and Value. I have duplicates so first I need to dedupe across all four columns, then group by Month and sum up the value. And then, I need to average across the Month to arrive at a single value. How would I achieve this in DAX?
I figured it out. Reply by #OscarLar was very close but nested SUMMARIZE causes problems because it cannot aggregate values calculated dynamically within the query itself (https://www.sqlbi.com/articles/nested-grouping-using-groupby-vs-summarize/).
I kept the inner SUMMARIZE from #OscarLar's answer changed the outer SUMMARIZE with a GROUPBY. Here's the code that worked.
AVERAGEX(GROUPBY(SUMMARIZE(Data, Data[Name], Data[Month], Data[Country], Data[Value]), Data[Month], "Month_Value", sumx(CURRENTGROUP(), Data[Value])), [Month_Value])
Not sure I completeley understood the question since you didn't provide example data or some DAX code you've already tried. Please do so next time.
I'm assuming parts of this can not (for reasons) be done using power query so that you have to use DAX. Then I think this will do what you described.
Create a temporary data table called Data_reduced in which duplicate rows have been removed.
Data_reduced =
SUMMARIZE(
'Data';
[Name];
[Month];
[Country];
[Value]
)
Then create the averaging measure like this
AveragePerMonth =
AVERAGEX(
SUMMARIZE(
'Data_reduced';
'Data_reduced'[Month];
"Sum_month"; SUM('Data_reduced'[Value])
);
[Sum_month]
)
Where Data is the name of the table.

SSRS Grouping Summary - with Max not working

This is the data that comes back from the database
Data Sample for one season (the report returns values for two):
What you can see is groupings, by Season, Theater then Performance number and lastly we have the revenue and ticket columns.
The SSRS Report Has three levels of groupings. Pkg (another ID that groups the below), venue -- the venue column and perf_desc -- the description column linked tot he perf_no.
Looks like this --
What I need to do is take the revenue column (a unique value) for each Performance and return it in a separate column -- so i use this formula.
sum(Max(Fields!perf_tix.Value, "perf_desc"))
This works great, gives me the total unique value for each performance -- and sums them up by the pkg level.
The catch is when i need to pull the data out by season.
I created a separate column looks like this
it's yellow because it's invisible and is referenced elsewhere. But the expression is if the Season value = to the Parameter (passed season value) -- then basically pull the sum of each of the tix values and sum them up. This also works great on the lower line - the line where the grouping exists for pkg -- light blue in my case.
=iif(Fields!season.Value = Parameters!season.Value, Sum(Max(Fields!perf_tix.Value, "perf_desc")), 0)
However, the line above -- the parent/header line its giving me the sum of the two seasons values. Basically adding it all up. This is not what I want and also why is it doing this. The season value is not equal to the passed parameter for the second season value so why is it adding it to the grouped value.
How do I fix this??
Since your aggregate function is inside your IIF function, only the first record in your dataset is being evaluated. If the first one matches the parameter, all records would be included.
This might work:
=IIF(Fields!season.Value = Parameters!season.Value, Sum(Max(Fields!perf_tix.Value, "perf_desc")), 0)
It might be better if your report was also grouping on the Venue, otherwise you count may include all values.

How to extract just the IN count of a Tableau set

How can I extract the IN count portion of a Tableau set? I can see the IN/OUT counts when I drop the set into Text but can't figure out how to get at the IN value by itself.
Ultimately, I want to create a Pie Chart of three sets with just the IN counts as the measures.
I am using Tableau Public if that is a factor.
You have to be a little careful about specifying what you wish to count.
One way to think of a set is as a Boolean function that gives a value to each data record denoting whether that record is associated with the set.
Another way to think of a set is as a mathematical set whose members are a subset of the values for some discrete field. (Or Tuple of fields)
The difference between the two views is really just a mindset, whether you consider the set as a Boolean function whose domain is a data row in the data source, or whose domain is the field on which the set definition is based.
Say you are looking at Tableau’s Superstore data set where each data record is a line item for a product attached to an order.
If your set is based on the field Region, say its called [My Favorite Regions] and currently contains {“East”, “Central”} do you want your count to be 2 (i.e. the number of regions in the set) ? Or do you want your count to be in the tens of thousands (i.e the number of line items on orders from the regions in the set)? Or something in between, maybe the number of distinct orders (i.e. order ids) within the selected regions...
If you want to count data rows that are associated with the set, you can simply filter by the set and calculate SUM([Number of Records[). If you want to count the regions in the set even though the level of detail of the data is at the order line item level,then you’ll have to use either a COUNTD to count the distinct regions, or some approach to specify what it is you want Tableau to count.
For example, put your set on the filter shelf, and show COUNTD(Region) which could be slow for very large data sets. To get the same effect without an explicit filter, you can define a LOD calculation such as:
{ COUNTD(if [My Favorite Regions] then [Region] end) }
Or you could use a table calc with the SIZE() function to do the calculation in the Tableau client instead of by the data source.
Not sure what your data looks like but you could set a certain condition when creating a set or split the IN/OUT into two different sets.
Here's a link to sets in Tableau.
You can do this with an if statement
IF [set] = TRUE THEN 1 ELSE 0 END
Then I suppose you could sum this calculated field
The most common usage is when you have a lot of categories and want to create an 'Other' category based on the categories that aren't in a set, if the set is a "Top N Set"
To do this:
IF [set] = TRUE THEN [dimension] ELSE 'Others' END

Google Sheets Query - Sort By Date; Blanks/Null to the bottom

Running into an issue when running a query in Google Sheets. The results of the array formula query are correct but the column utilized to order the results (Col1) is comprised of both blank/null cells and dates. As such, when ordered by this column the blank/null values are listed first before the dates. Is it possible to have the dates ranked first and push the blank/null cells to the bottom?
Ordering by DESC will not work as I would want the earlier dates listed first. Additionally, the blank/null cells cannot be excluded entirely from the results either (e.g. they correspond to tasks without deadlines but must still be listed).
The formula I am currently using is:
=ARRAYFORMULA((QUERY({DATA RANGE},"SELECT Col1 WHERE Col2 = X OR Col3 = X ORDER BY Col1 LIMIT 10",0))
Seems like there is an easy way to achieve this but I cannot find anything on the topic in other forums. Any help would be greatly appreciated.
Use SORT()
I believe for your example you could make it work like so:
=SORT(ARRAYFORMULA((QUERY({DATA RANGE},"SELECT Col1 WHERE Col2 = X OR Col3 = X",0)), 1, 1) (untested)
If your LIMIT 10 is important, then I think you could wrap the whole thing in another query and re-add the LIMIT.
Illustrated Example:
Range That Needs Querying and Sorting
Formula
Simple version defining a range in which the header is omitted:
=SORT(QUERY(A2:B7, "select *"), 1, 1)
Version that handles headers:
={A1:B1;SORT(QUERY(tabname!A2:B7, "select *"), 1, 1)}
This version creates an array combining the header row and the data rows so it can sort the data rows independently of the header.
Queried and Sorted Results
Breakdown of Formula Components
Array {[range 1]; [range 2]}
SORT() SORT([range], [column to sort on], [sort ascending - true/false or 1/0)
Query() QUERY([range], "[query]")

running total using windows function in sql has same result for same data

From every references that I search how to do cumulative sum / running total. they said it's better using windows function, so I did
select grandtotal,sum(grandtotal)over(order by agentname) from call
but I realize that the results are okay as long as the value of each rows are different. Here is the result :
Is There anyway to fix this?
You might want to review the documentation on window specifications (which is here). The default is "range between" which defines the range by the values in the row. You want "rows between":
select grandtotal,
sum(grandtotal) over (order by agentname rows between unbounded preceding and current row)
from call;
Alternatively, you could include an id column in the sort to guarantee uniqueness and not have to deal with the issue of equal key values.