I need to calculate the monthly average in Power Query based off of historical prices. I created a column key that provides the number of the month (based on the pricing date in the same row), so I can use this to identify locations of the pricing dates that are to be used to calculate the average.
The equivalent in excel would be using averageif...
i.e. =averageif('pricing dates','column key','historical prices')
Appreciate any info or tips on this.
Wouldn't it be straightforward Grouping By?
let
Source = Excel.CurrentWorkbook(){[Name="Input"]}[Content],
Typed = Table.TransformColumnTypes(Source,{{"Pricing Date", type date}, {"Column key", Int64.Type}, {"Price", type number}}),
GroupBy = Table.Group(Typed, {"Column key"}, {{"AveragePrice", each List.Average([Price]), type number}})
in
GroupBy
Related
I have a list of timestamps spanning multiple dates ( no sym, just timestamps). These can be 1000/2000 at times, spanning multiple dates.
What's the most performant way to hit an hdb and get the closest price available for each timestamp?
select from hdbtable where date = x -> can be over 60mm rows.
To do this for each date and then an aj on top is very poor.
Any suggestions are welcome
The most performant way to aj, assuming the HDB follows the standard conventions of date-partitioned with `p# attribute on sym, is
aj[`sym`time;select sym,time,other from myTable where …;select sym,time,price from prices where date=x]
There should be no additional filters/where-clause on the prices table other than date.
You're saying you have no syms just timestamps but what does that mean? Does that mean you want the price of all syms at that timestamp or you want the last price of any sym at that timestamp? The former is easy as you can just join your timestamps to your distinct sym list and use that as the "left" table in the aj. The latter will not be as easy as the HDB data likely isn't fully sorted on time, it's likely sorted by sym and then time. In that case you might have to again join your timestamps to your distinct sym list and aj for the price for all syms and from that result take the one with the max time.
So I guess it depends on a few factors. More info might help.
EDIT: suggestion based on further discussion:
targetTimes:update targetTime:time from ([]time:"n"$09:43:19 10:27:58 13:12:11 15:34:03);
res:aj0[`sym`time;(select distinct sym from trade where date=2021.01.22)cross targetTimes;select sym,time,price from trade where date=2021.01.22];
select from res where not null price,time=(max;time)fby targetTime
sym time targetTime price
----------------------------------------------------
AQMS 0D09:43:18.999937967 0D09:43:19.000000000 4.5
ARNA 0D10:27:57.999842638 0D10:27:58.000000000 76.49
GE 0D15:34:02.999979520 0D15:34:03.000000000 11.17
HAL 0D13:12:10.997972224 0D13:12:11.000000000 18.81
This gives the price of whichever sym is closest to your targetTime. Then you would peach this over multiple dates:
{targetTimes: ...;res:aj0[...];select from res ...}peach mydates;
Note that what's making this complicated is your requirement that it be the price of any sym that's closest to your sym-less targetTimes. This seems strange - usually you would want the price of sym(s) as of a particular time, not the price of anything closest to a particular time.
You can use multithreading to optimize your query, with each thread being assigned a date to process, essentially utilising more than just one core:
{select from hdbtable where date = x} peach listofdates
More info on multithreading can be found here, and more info on peach can be found here
I have a very Basic flat file with Sales by date and product names. I need to create a field for First sales day where sales are greater than 100 units.
I tried {FIXED [Style Code]: MIN([Prod Cal Activity Date])} but that just gives me the first day in the data the Style code Exists
I also tried IF ([Net Sales Units]>200) THEN {FIXED [Style Code]: MIN([Prod Cal Activity Date])}END but that also gives me the first day in the data the Style code Exists
DATA EXISTS PRIOR TO SALES DATE
You can use the following calculation:
MIN(IF([Net Sales Units]>100) THEN [Prod Cal Activity Date] ELSE #2100-01-01# END)
The IF([Net Sales Units]>100) THEN [Prod Cal Activity Date] ELSE #2100-01-01# END part of the calculation converts the date into a very high value (year 2100 in the example) for all the cases where the sales was more than 100 units. Once this is done, you can simply take a minimum of the calculated date to get the desired result. If you need this by style code, then you can add a fixed function in the beginning.
A few ways to simplify further if you like. They don't change the meaning.
You don't need parenthesis around boolean expressions as you would in C.
You can eliminate the ELSE clause altogether. The if expression will default to null in cases where the condition was false. Aggregation functions like MIN(), MAX(), SUM() etc silently ignore nulls, so you don't have to come up with some default future date.
So MIN(IF [Net Sales Units] > 100 THEN [Prod Cal Activity Date] END is exactly equivalent, just a few less characters to read.
The next possible twist has a bit of analytic value beyond just saving keystrokes.
You don't need to hard code the choice of aggregation function into the calculation. You could instead name your calculated field something like High Sales Activity Date defined as just
if [Net Sales Units] > 100 then [Prod Cal Activity Date] end
This field just holds the date for records with high sales, and is null for records with low sales. But by leaving the aggregation function out of the calculation, you have more flexibility to use it in different ways. For example, you could
Calculate the earliest (i.e. Min) high sales date as requested originally
Calculate the latest high sales date using Max
Filter to only dates with high sales by filtering special non-null values
Calculate the number of high sales dates using COUNTD
Simple little filtering calculations like this can be very useful - so called because of the embedded if statement effectively filters out values that don't match the condition. There are still null values for the other records, but since aggregation functions ignore nulls, you can think of them as effectively filtered out by the calculation.
The dataset is university rankings and I have a column 'world rank' and 'year'. I want to create a new field called 'rank difference' to see the difference in rank of universities from 2018 to 2011. Eg:
Name Year World Rank
Harvard 2011 4
Harvard 2018 5
For the above, rank difference would be -1. The data set contains a lot of universities and I am not sure how to perform LOD or any other solution for this.
You can use “if” inside a calculation to return a value in certain cases, and to evaluate to null otherwise. One phrase for this is a filtered calculation. Since aggregagation calculations like min(), max(), avg() etc silently ignore null values, you can then use filtered calculations in aggregate calculations.
So assuming your data source has one row per university per year, and that you put [University] on some shelf as a dimension, then the following calculation will get the result you’re looking for. Since you only have a row per university per year, you could just as easily use max(), sum() or avg() instead of min()
min(if [Year] = 2018 then [World Rank] end) - min(if [Year] = 2011 then [World Rank] end)
To extend, you could use a user supplied parameter instead of hard coded start and end years, or an LOD calculation to find the earliest and latest records for each university in your data set.
I have a fact table housing different granularity (date grain)
Monthly
Daily
The month data can be accessed by filtering by end of month date or using YYYYMM date format. In OBIEE RPD repo, the fact is set to LAST Aggregation.
I want to perform Year to Date analysis. And I want to sum only month end dates.
Using function TODATE(Measure), it tends to sum up all the data through out the month e.grain
Date Amount YTD TODate(Amount)
31/01/2106 100 100
28/02/2016 200 300
14/03/2016 50 350*
31/03/2016 100 450
I want YTD to ignore 50 and return 400, so also any other dates that falls within any month. And if if I Select 14/03/2016 I want 350 to return.
Thanks.
Alter the table to add a flag, something that flags Y if the record is at the specified monthly grain, and N if the record is not at the specified monthly grain.
In the logical layer, create two distinct LTSs with the first filtering on the flag for Y. This will be where you will calculate and source all your to date measures. The second LTS can either be filtered to N, or can be left to all the data depending on what you want to do with it.
The performance increases should come from the fact that any month measures you build off that monthly LTS will only hit records flagged as month, and will bypass all that other data that is not relevant. So if a user runs a report only asking for monthly measures, the query will automatically filter to that specific data.
What will happen is if a user selects your to date measure and a specific date measure on the same report, OBIEE should fire off two separate queries to get the data and stitch together based on common dimensions.
Could someone create this in the front end? Probably. You would have to do some sort of PERIODROLLING function, and tell it to aggregate at the month level, but I am afraid it may still roll those days up into a larger than desired number. A TODATE function will not work here.
I need to know what is the rate of booking of beds (Like in an hotel).
The number of beds (summed month per month) for a range of dates, for the bookings that are in the range (including the partial sum of bookings for the dates that are partially in the range)
I created a "booking" fact table with a StartDate and a EndDate with a measure "countSejoursDate" (count(rows)) and a measure "NbrOfBeds" (sum).
I created 2 "wizard time" dimensions linked as following :
I also created a 3rd "wizard time" dimension called "Date" not linked to any fact.
While trying to get the result, using the MDX below, I'm just able to retrieve the count of rows inside a range of dates... but even with this, the value of the 1srt day of each month is false!
with member nbsejsDate as AGGREGATE(
{NULL:LINKMEMBER([Date].[Calendrier].CURRENTMEMBER,[START_DATE].[Start_Calendrier])}
* {LINKMEMBER([DATE].[Calendrier].CURRENTMEMBER, [END_DATE].[End_Calendrier]):NULL}
, [Measures].[countSejoursDate])
select nbsejsDate
on 0
, [Date].[Calendrier].[Jour].&[2015-03-01]:[Date].[Calendrier].[Jour].&[2015-03-31] on 1
from [Cube]
It's a bit strange as what we've here is an many-to-many relation in the form of a start and end date. Trying to make a correct calculation relying on MDX calculation instead of using a many-to-many relation it's tricky and very,very error prone.
There are different possibilities for solving this :
Use a Range (From - To) link type in the link of time dimension in the Facts.
Use a Javascript view to create an new column that is an array of dates (start/end). The many-to-many relation is created on the fly.
This should make a lot easier any calculation, if I understood the problem correctly.
hope it helps