How to find difference of items in same column by group? - tableau-api

The dataset is university rankings and I have a column 'world rank' and 'year'. I want to create a new field called 'rank difference' to see the difference in rank of universities from 2018 to 2011. Eg:
Name Year World Rank
Harvard 2011 4
Harvard 2018 5
For the above, rank difference would be -1. The data set contains a lot of universities and I am not sure how to perform LOD or any other solution for this.

You can use “if” inside a calculation to return a value in certain cases, and to evaluate to null otherwise. One phrase for this is a filtered calculation. Since aggregagation calculations like min(), max(), avg() etc silently ignore null values, you can then use filtered calculations in aggregate calculations.
So assuming your data source has one row per university per year, and that you put [University] on some shelf as a dimension, then the following calculation will get the result you’re looking for. Since you only have a row per university per year, you could just as easily use max(), sum() or avg() instead of min()
min(if [Year] = 2018 then [World Rank] end) - min(if [Year] = 2011 then [World Rank] end)
To extend, you could use a user supplied parameter instead of hard coded start and end years, or an LOD calculation to find the earliest and latest records for each university in your data set.

Related

most performant way to get asof price given a list of timestamps

I have a list of timestamps spanning multiple dates ( no sym, just timestamps). These can be 1000/2000 at times, spanning multiple dates.
What's the most performant way to hit an hdb and get the closest price available for each timestamp?
select from hdbtable where date = x -> can be over 60mm rows.
To do this for each date and then an aj on top is very poor.
Any suggestions are welcome
The most performant way to aj, assuming the HDB follows the standard conventions of date-partitioned with `p# attribute on sym, is
aj[`sym`time;select sym,time,other from myTable where …;select sym,time,price from prices where date=x]
There should be no additional filters/where-clause on the prices table other than date.
You're saying you have no syms just timestamps but what does that mean? Does that mean you want the price of all syms at that timestamp or you want the last price of any sym at that timestamp? The former is easy as you can just join your timestamps to your distinct sym list and use that as the "left" table in the aj. The latter will not be as easy as the HDB data likely isn't fully sorted on time, it's likely sorted by sym and then time. In that case you might have to again join your timestamps to your distinct sym list and aj for the price for all syms and from that result take the one with the max time.
So I guess it depends on a few factors. More info might help.
EDIT: suggestion based on further discussion:
targetTimes:update targetTime:time from ([]time:"n"$09:43:19 10:27:58 13:12:11 15:34:03);
res:aj0[`sym`time;(select distinct sym from trade where date=2021.01.22)cross targetTimes;select sym,time,price from trade where date=2021.01.22];
select from res where not null price,time=(max;time)fby targetTime
sym time targetTime price
----------------------------------------------------
AQMS 0D09:43:18.999937967 0D09:43:19.000000000 4.5
ARNA 0D10:27:57.999842638 0D10:27:58.000000000 76.49
GE 0D15:34:02.999979520 0D15:34:03.000000000 11.17
HAL 0D13:12:10.997972224 0D13:12:11.000000000 18.81
This gives the price of whichever sym is closest to your targetTime. Then you would peach this over multiple dates:
{targetTimes: ...;res:aj0[...];select from res ...}peach mydates;
Note that what's making this complicated is your requirement that it be the price of any sym that's closest to your sym-less targetTimes. This seems strange - usually you would want the price of sym(s) as of a particular time, not the price of anything closest to a particular time.
You can use multithreading to optimize your query, with each thread being assigned a date to process, essentially utilising more than just one core:
{select from hdbtable where date = x} peach listofdates
More info on multithreading can be found here, and more info on peach can be found here

First date with sales greater than 100 in TABLEAU

I have a very Basic flat file with Sales by date and product names. I need to create a field for First sales day where sales are greater than 100 units.
I tried {FIXED [Style Code]: MIN([Prod Cal Activity Date])} but that just gives me the first day in the data the Style code Exists
I also tried IF ([Net Sales Units]>200) THEN {FIXED [Style Code]: MIN([Prod Cal Activity Date])}END but that also gives me the first day in the data the Style code Exists
DATA EXISTS PRIOR TO SALES DATE
You can use the following calculation:
MIN(IF([Net Sales Units]>100) THEN [Prod Cal Activity Date] ELSE #2100-01-01# END)
The IF([Net Sales Units]>100) THEN [Prod Cal Activity Date] ELSE #2100-01-01# END part of the calculation converts the date into a very high value (year 2100 in the example) for all the cases where the sales was more than 100 units. Once this is done, you can simply take a minimum of the calculated date to get the desired result. If you need this by style code, then you can add a fixed function in the beginning.
A few ways to simplify further if you like. They don't change the meaning.
You don't need parenthesis around boolean expressions as you would in C.
You can eliminate the ELSE clause altogether. The if expression will default to null in cases where the condition was false. Aggregation functions like MIN(), MAX(), SUM() etc silently ignore nulls, so you don't have to come up with some default future date.
So MIN(IF [Net Sales Units] > 100 THEN [Prod Cal Activity Date] END is exactly equivalent, just a few less characters to read.
The next possible twist has a bit of analytic value beyond just saving keystrokes.
You don't need to hard code the choice of aggregation function into the calculation. You could instead name your calculated field something like High Sales Activity Date defined as just
if [Net Sales Units] > 100 then [Prod Cal Activity Date] end
This field just holds the date for records with high sales, and is null for records with low sales. But by leaving the aggregation function out of the calculation, you have more flexibility to use it in different ways. For example, you could
Calculate the earliest (i.e. Min) high sales date as requested originally
Calculate the latest high sales date using Max
Filter to only dates with high sales by filtering special non-null values
Calculate the number of high sales dates using COUNTD
Simple little filtering calculations like this can be very useful - so called because of the embedded if statement effectively filters out values that don't match the condition. There are still null values for the other records, but since aggregation functions ignore nulls, you can think of them as effectively filtered out by the calculation.

Calculated Field to Count While Between Dates

I am creating a Tableau visualization for floor stock in our plant. We have a column for incoming date, quantity, and outgoing date. I am trying to create a visualization that sums the quantity but only while between the 2 columns.
So for example, if we have 9 parts in stock that arrived on 9/1 and is scheduled to ship out on 9/14, I would like this visualization to include these 9 parts in the sum only while it is in our stock between those 2 dates. Here is an example of some of the data I am working with.
4/20/2018 006 5/30/2018
4/20/2018 017 5/30/2018
4/20/2018 008 5/30/2018
6/29/2018 161 9/7/2018
Create a new calculation:
if [ArrivalDate]>="2018-09-01" and [ArrivalDate]<"2018-09-15"
and [Shipdate]<'2018-09-15"
then [MEASUREofStock] else 0 end
Here is a solution using UNIONs written before Tableau added support for Unions (so it required custom SQL)
Volume of an Incident Queue at a Point in Time
For several years now, Tableau has supported Union directly, so now it is possible to get the same effect without writing custom SQL, but the concept is the same.
The main thing to understand is that you need a data row per event (per arrival or per departure) and a single date column, not two. That will let you calculate the net change in quantity per day, and you can then use a running total if you want to see the absolute quantity at the close of each day
There is no simple way to display the total quantity between the two dates without changing the input table structure. If you want to show all dates and the "eligible" quantity in each day, you should
Create a calendar table that has all dates start from 1990-01-01 to 2029-12-31. (You can limit the dates to be displayed in dashboard later by applying date filter, but here you want to be safe and include all dates that may exist in your stock table) Here is how to create the date table quickly.
Left join the date table to stock table and calculate the eligible quantity in each day.
SELECT
a.date,
SUM(CASE WHEN b.quantity IS NULL THEN 0 ELSE b.quantity END) AS quantity
FROM date a
LEFT JOIN
stock b on a.date BETWEEN b.Incoming_Date AND b.Outgoing_Date
GROUP BY a.date
Import the output table to Tableau, and simply add dates and quantity to the chart.

OBIEE YTD Issues

I have a fact table housing different granularity (date grain)
Monthly
Daily
The month data can be accessed by filtering by end of month date or using YYYYMM date format. In OBIEE RPD repo, the fact is set to LAST Aggregation.
I want to perform Year to Date analysis. And I want to sum only month end dates.
Using function TODATE(Measure), it tends to sum up all the data through out the month e.grain
Date Amount YTD TODate(Amount)
31/01/2106 100 100
28/02/2016 200 300
14/03/2016 50 350*
31/03/2016 100 450
I want YTD to ignore 50 and return 400, so also any other dates that falls within any month. And if if I Select 14/03/2016 I want 350 to return.
Thanks.
Alter the table to add a flag, something that flags Y if the record is at the specified monthly grain, and N if the record is not at the specified monthly grain.
In the logical layer, create two distinct LTSs with the first filtering on the flag for Y. This will be where you will calculate and source all your to date measures. The second LTS can either be filtered to N, or can be left to all the data depending on what you want to do with it.
The performance increases should come from the fact that any month measures you build off that monthly LTS will only hit records flagged as month, and will bypass all that other data that is not relevant. So if a user runs a report only asking for monthly measures, the query will automatically filter to that specific data.
What will happen is if a user selects your to date measure and a specific date measure on the same report, OBIEE should fire off two separate queries to get the data and stitch together based on common dimensions.
Could someone create this in the front end? Probably. You would have to do some sort of PERIODROLLING function, and tell it to aggregate at the month level, but I am afraid it may still roll those days up into a larger than desired number. A TODATE function will not work here.

MDX - calculate one date dimension from another date dimension

I have a fact table that has 2 dates Invoice Date and Accounting Current Date. In order to get requested Revenue value I need to use combination of these two dates. For example, if I need YTD Revenue I need to select it like this:
(Note: I am writing SQL query because I am more familiar with it)
SELECT Revenue
FROM
Fact_Revenue
WHERE
Invoice_Date <= '2011-10-22'
and AccountingCurrent >= '2011-01'
and AccountingCurrent <= '2011-10'
Besides Revenue, this fact tables has other information that I also need, but for calculating this other data I don't need Accounting Current Date. So my idea is to use only 1 date (Invoice Date) in main MDX query (so that I can grab as many data with 1 query as I can) and for calculating Revenue I would like to use Calculated Member and in there I would like to associate Accounting Current Date with selected Invoice Date.
For example
SELECT {[Measure].[RevenueYTD],
[Measure].[RevenueMTD],
[Measure].[NumberOfInvoices],
[Measure].[NumberOfPolicies]}
ON COLUMNS,
{[People].Members} ON ROWS
FROM [Cube]
WHERE
[Invoice Date].[Date Hierarchy].[Date].&[2011-10-22]
In this case, [Measure].[RevenueYTD] and [Measure].[RevenueMTD] need to be limited by Accounting Current Date and Invoice Date must be lower than the date from the query. On the other hand, I need [Measure].[NumberOfInvoices] and [Measure].[NumberOfPolicies] for particual Invoice Date (or MTD Date, whatever), but without involvemenet of Accounting Current Date
Calculated member query should do something like this (this is more like algorithm):
ROUND(
SUM(
YTD([Accounting Current Date].[Date Hierarchy].CurrentMember),
[Measures].[Revenue]
),
2)
WHERE [Invoice Current Date].[Date Hierarchy] < [Invoice Current Date].[Date Hierarchy].CurrentMember
Navigating from one dimension to another is not something trivial in MDX. In theory dimensions are independent so standard language is missing functions for doing this. You can use StrToMember MDX function but it's slow and a bit strange.
For your filters, let's start with the first one :
Invoice_Date <= '2011-10-22'
In MDX we'll have to create a set with the members matching the expression. This can be done using the Range set operator :
NULL:[Invoice Date].[Date Hierarchy].[Date].&[2011-10-22]
The other filter is easy to guess :
AccountingCurrent >= '2011-01' and AccountingCurrent <= '2011-10'
MDX version :
[Accounting Date].[Date Hierarchy].[Date].&[2011-01-31]:[Accounting Date].[Date Hierarchy].[Date].&[2011-10-30]
It's also possible using Filter MDX function if your need different type of filters.
Now we need to take the pieces and build the query. One possible solution is using a set slicer and overwritting the values when you don't want the filter to be applied :
WITH
// here we're changing the 'selection' from the where clause
MEMBER [Measure].[NumberOfInvoices II] AS ([Accounting Date].[Date Hierarchy].defaultmember,[Measure].[NumberOfInvoices])
SELECT
.. axis here [Measure].[RevenueYTD] will be applying the filters defined in the where clause
FROM MyCube
WHERE {[Accounting Date].[Date Hierarchy].[Date].&[2011-01-31]:[Accounting Date].[Date Hierarchy].[Date].&[2011-10-30]}