How to model a Power BI date table when the database date table has duplicates due to each client having their own calendar? - tsql

In the database I have a date table. The date table has got a calendar for each client. So for example say there are 10 clients, and the calendar has dates for 5 years, then there are 10 * 5 * 365 records in this table.
Example:
+--------+------------+------+------+----------+--------+
| Client | Date | FYYR | FYWK | CORPFYYR | CORPWK |
+--------+------------+------+------+----------+--------+
| Costco | 01-06-2022 | 2023 | 1 | 2022 | 22 |
| Walmart| 01-06-2022 | 2022 | 22 | 2022 | 22 |
| Costco | 02-06-2022 | 2023 | 1 | 2022 | 22 |
| Walmart| 02-06-2022 | 2022 | 22 | 2022 | 22 |
| Costco | 03-06-2022 | 2023 | 1 | 2022 | 22 |
| Walmart| 03-06-2022 | 2022 | 22 | 2022 | 22 |
| Costco | 04-06-2022 | 2023 | 1 | 2022 | 22 |
| Walmart| 04-06-2022 | 2022 | 22 | 2022 | 22 |
+--------+------------+------+------+----------+--------+
When I import this table into Power BI, then it doesn't allow me to mark it as date table (due to duplicates).
Since it has duplicate dates, when I create a relationship from this table to the fact table, it gets created as a M:M relation (msdn documentation mentions that M:M reduces the model performance).
On the report I have a slicer (on client name from this date table) to ensure that only 1 client is selected, so that the calendar then doesn't have duplicates.
I cannot use DAX date/time intelligence function because this table cannot be marked as a date table.
To solve this I could create 5 date tables from that table, mark them all as date tables and connect all of them to the fact table. Then have 1 report page per client. But I don't want to do this as I don't want to create separate report page per client.
What is the correct way to model such a date table in this scenario via SQL or PowerQuery or PowerBI? The end goal being that the table can be marked as a date table so that I can use date/time intelligence DAX.

Time intelligence functions won't work without a proper date table. In addition a many-to-many should be avoided if at all possible as it will make the rest of your DAX very complicated.
A date table is by definition just a dimension with no duplicates and a full range of dates covering an entire year. You can create this dimension from your fact table in PQ.
The real question is why does each client get their own calendar? What is the difference between client 1's calendar and client 2's calendar?
Many to many relationships are "limited" relationships and do not behave like normal one-to-many relationships in a whole host of ways (e.g. no blank row for missing dimension keys). It is a very detailed subject and you're best reading from the experts here: https://www.sqlbi.com/articles/strong-and-weak-relationships-in-power-bi/
Regarding having a different calendar table per client, I think I understand now and the solution might be complicated. If you only have a few clients, I would be tempted to create these calendars as additional columns of a standard date table. e.g. Date - Day - Month - Year - Etc - Client Type 1 FY Start, Client Type 2 FY Start
Ideally there is some commonality between each client so you can genericise the special columns as I have done with Client Type rather than individual client.
It is common in PBI to create dimension tables from a fact table. You do this by referencing the fact table, removing other columns, removing duplicates and then you are left with a dimension table to join to your fact table in the model.

To reduce the count of calendars, I recommend to abstract the different calendars from the customers. For example:
Customer A uses Calendar A
Customer B uses Calendar B
Customer C uses Calendar A
Customer C uses Calendar A
Then you should hopefully end up with just a couple of calendars.
In the next step I would add columns to the date table according to the calendars. If the calendars just differ in week information this would be WeekNumber, WeekYear and so on.
Example:
DateKey
Week_C1
WeekYear_C1
WeekYearLabel_C1
Week_C2
WeekYear_C2
WeekYearLabel_C2
20230101
1
202301
WYC1 01/2023
1
202301
WYC2 01/2023
20230102
1
202301
WYC1 01/2023
1
202301
WYC2 01/2023
..
..
..
..
..
..
..
20230108
2
202302
WYC1 02/2023
1
202301
WYC2 01/2023
20230109
2
202302
WYC1 02/2023
2
202302
WYC2 02/2023
You end up with a date table with unique dates, but more columns. You will be able to mark this table as a data table, but you have to build up your reports according to the customers calendar, by slicing and filtering the appropriate columns.
This might be a great use case for field parameters.
To transform your existing date table you can use the M query:
let
Source = Csv.Document(File.Contents("C:\Data\example.csv"),[Delimiter=";", Columns=6, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Client", type text}, {"Date", type date}, {"FYYR", Int64.Type}, {"FYWK", Int64.Type}, {"CORPFYYR", Int64.Type}, {"CORPWK", Int64.Type}}),
#"Duplicated Column" = Table.DuplicateColumn(#"Changed Type", "Client", "Client2"),
#"Pivoted Column" = Table.Pivot(#"Duplicated Column", List.Distinct(#"Duplicated Column"[Client]), "Client2", "FYYR", List.Sum),
#"Renamed Columns" = Table.RenameColumns(#"Pivoted Column",{{"Costco", "Costco_FYYR"}, {"Walmart", "Walmart_FYYR"}}),
#"Duplicated Column1" = Table.DuplicateColumn(#"Renamed Columns", "Client", "Client2"),
#"Pivoted Column1" = Table.Pivot(#"Duplicated Column1", List.Distinct(#"Duplicated Column1"[Client2]), "Client2", "FYWK", List.Sum),
#"Renamed Columns1" = Table.RenameColumns(#"Pivoted Column1",{{"Costco", "Costco_FYWK"}, {"Walmart", "Walmart_FYWK"}}),
#"Duplicated Column2" = Table.DuplicateColumn(#"Renamed Columns1", "Client", "Client2"),
#"Pivoted Column2" = Table.Pivot(#"Duplicated Column2", List.Distinct(#"Duplicated Column2"[Client2]), "Client2", "CORPFYYR", List.Sum),
#"Renamed Columns2" = Table.RenameColumns(#"Pivoted Column2",{{"Walmart", "Walmart_CORPFYYR"}, {"Costco", "Costco_CORPFYYR"}}),
#"Pivoted Column3" = Table.Pivot(#"Renamed Columns2", List.Distinct(#"Renamed Columns2"[Client]), "Client", "CORPWK", List.Sum),
#"Renamed Columns3" = Table.RenameColumns(#"Pivoted Column3",{{"Costco", "Costco_CORPWK"}, {"Walmart", "Walmart_CORPWK"}}),
#"Grouped Rows" = Table.Group(#"Renamed Columns3", {"Date"}, {
{"C_FYYR", each List.Max([Costco_FYYR]), Int64.Type},
{"C_FYWK", each List.Max([Costco_FYWK]), Int64.Type},
{"C_CORPFYYR", each List.Max([Costco_CORPFYYR]), Int64.Type},
{"C_CORPWK", each List.Max([Costco_CORPWK]), Int64.Type},
{"W_FYYR", each List.Max([Walmart_FYYR]), Int64.Type},
{"W_FYWK", each List.Max([Walmart_FYWK]), Int64.Type},
{"W_CORPFYYR", each List.Max([Walmart_CORPFYYR]), Int64.Type},
{"W_CORPWK", each List.Max([Walmart_CORPWK]), Int64.Type}
})
in
#"Grouped Rows"
I tried it with your example data as a csv:
Client;Date;FYYR;FYWK;CORPFYYR;CORPWK
Costco;01-06-2022;2023;1;2022;22
Walmart;01-06-2022;2022;22;2022;22
Costco;02-06-2022;2023;1;2022;22
Walmart;02-06-2022;2022;22;2022;22
Costco;03-06-2022;2023;1;2022;22
Walmart;03-06-2022;2022;22;2022;22
Costco;04-06-2022;2023;1;2022;22
Walmart;04-06-2022;2022;22;2022;22
It turns the table from:
to

Related

Timezone and POSIXct handling in RpostgreSQL

I'm having an issue with datetime handling in RPostgreSQL. Specifically it relates to POSIXct objects with a UTC timezone being automatically adjusted to daylight saving during upload to a postgres database. A simple example:
library(RPostgreSQL)
example = data.frame(date=as.POSIXct('2016-08-14 15:50:00',tz='UTC'))
con = dbConnect(dbDriver("PostgreSQL"),
dbname="mydb",
host="localhost",
port="5432",
user="me",
password="password")
dbWriteTable(con,name=c('myschema','mytable'),example,overwrite=T)
example2 = dbReadTable(con,name=c('myschema','mytable'))
dbDisconnect(con)
example2 # 2016-08-14 14:50:00
In this case the time is exported as 15:50 but read back in as 14:50, suggesting that British Summer Time daylight saving has been applied. I've tried adjusting my system settings to UTC, setting the timezone in R to UTC using Sys.setenv(TZ='UTC') and setting the timezone in Postgres to UTC using SET timezone TO 'UTC', all to no avail.
Does anybody know where in the process the conversion is likely to be happening and where dbWriteTable is taking its timezone from? Are there any suggestions on other settings that might need adjusting?
I also get strange issues with RPostgreSQL (with UTC somehow being UTC -4:00). But things seem fine using RPostgres.
Note that the time zone displayed in R is in local time. If you go into PostgreSQL (say, psql) after running the R code and SET TIME ZONE 'GMT';, you see that the 2016-08-14 16:50:00 displayed in R is actually stored in the database as 2016-08-14 15:50:00 UTC. In other words, 2016-08-14 16:50:00 displayed in R is correct for rubbish_alt in my example.
crsp=# SET TIME ZONE 'GMT';
SET
crsp=# SELECT * FROM rubbish;
row.names | date
-----------+------------------------
1 | 2016-08-14 19:50:00+00
(1 row)
crsp=# SELECT * FROM rubbish_alt;
date
------------------------
2016-08-14 15:50:00+00
(1 row)
crsp=# \d rubbish
Table "public.rubbish"
Column | Type | Modifiers
-----------+--------------------------+-----------
row.names | text |
date | timestamp with time zone |
crsp=# \d rubbish_alt
Table "public.rubbish_alt"
Column | Type | Modifiers
--------+--------------------------+-----------
date | timestamp with time zone |
R code (note using Sys.setenv(PGHOST="myhost", PGDATABASE="mydb"), etc, elsewhere makes this reprex()-generated code run for anyone):
Sys.setenv(TZ='Europe/London')
# With RPostgreSQL ----
library(RPostgreSQL)
#> Loading required package: DBI
example <- data.frame(date=as.POSIXct('2016-08-14 15:50:00', tz='UTC'))
con = dbConnect(PostgreSQL())
dbWriteTable(con, 'rubbish', example, overwrite=TRUE)
#> [1] TRUE
example2 <- dbReadTable(con, name="rubbish")
dbDisconnect(con)
#> [1] TRUE
example2
#> date
#> 1 2016-08-14 20:50:00
# With RPostgres ----
library(RPostgres)
example <- data.frame(date=as.POSIXct('2016-08-14 15:50:00', tz='UTC'))
con = dbConnect(Postgres())
dbWriteTable(con, 'rubbish_alt', example, overwrite=TRUE)
example2 <- dbReadTable(con, name="rubbish_alt")
dbDisconnect(con)
example2
#> date
#> 1 2016-08-14 16:50:00
example2$date[1]
#> [1] "2016-08-14 16:50:00 BST"

Producing date from year and month values in PostgreSQL

Hello I'm having two problems with converting a concatenated date value into an actual date.
I've tired looking here to convert the concatenated value with to_char(DATE ...) but I keep getting odd dates. I think it is because my month does not have a zero padding in front of it.
This is my base query:
SELECT
expiry_month,
expiry_year,
to_date(CONCAT(expiry_year, expiry_month), 'YYYY/MM'),
FROM thisTable
Here is an example of the data output:
expiry_month expiry_year concatvalues
9 2018 20189-01-01
1 2019 20191-01-01
5 2016 20165-01-01
3 2019 20193-01-01
10 2017 201710-01-01
2 2020 20202-01-01
I think I need to LPAD() my month value to get the correct date parsed. E.g. 01 not 1, and 05 not 5.
However when I try to LPAD the month values it does not work. I've tried:
lpad(to_char(expiry_month),2,'0'),
I get this error 'HINT: No function matches the given name and argument types. You might need to add explicit type casts.'
Which I don't understand because lpad is a function. Any suggestion on how to use LPAD()?
Thank you for the advice.
EDIT 1
I've tried to update the to_date() function with this code:
to_date(CONCAT(payment_cards.expiry_year || ' - ' || payment_cards.expiry_month || ' - 01'), 'YYYY-MM-01') and now it is throwing a different error:
ERROR: invalid value "- " for "MM" DETAIL: Value must be an integer.
I'm still thinking I need to pad the month date?
There's a '/' missing:
SELECT
expiry_month,
expiry_year,
to_date(CONCAT(expiry_year, '/', expiry_month), 'YYYY/MM') AS the_start_of_year_month
FROM thisTable ;
will produce:
expiry_month | expiry_year | the_start_of_year_month
-----------: | ----------: | :----------------------
9 | 2018 | 2018-09-01
1 | 2019 | 2019-01-01
5 | 2016 | 2016-05-01
3 | 2019 | 2019-03-01
10 | 2017 | 2017-10-01
2 | 2020 | 2020-02-01
The date format is specifying '/' and it wasn't there, so, the whole text was taken as the year, and the month and day were taken as 1/1. CONCAT('2018','9') was just returning '20189' (which is a valid year).
dbfiddle here
Use:
make_date(year int, month int, day int)
like:
make_date(expiry_year, expiry_month, 1)
Postgresql documentation

How to add blank records for grouping based on formula in Crystal Reports

I have one table and group the records using formula, based on a string field which is formed as time (HH:mm:ss)
Formula is as followings:
select Minute (TimeValue ({MASTER.Saat}))
case 0 to 14: ReplicateString ("0", 2-len(TOTEXT(Hour (TimeValue ({MASTER.Saat})),0))) & TOTEXT(Hour (TimeValue ({MASTER.Saat})),0) & ":00:00"
case 15 to 29: ReplicateString ("0", 2-len(TOTEXT(Hour (TimeValue ({MASTER.Saat})),0))) & TOTEXT(Hour (TimeValue ({MASTER.Saat})),0) & ":15:00"
case 30 to 44: ReplicateString ("0", 2-len(TOTEXT(Hour (TimeValue ({MASTER.Saat})),0))) & TOTEXT(Hour (TimeValue ({MASTER.Saat})),0) & ":30:00"
case 45 to 59: ReplicateString ("0", 2-len(TOTEXT(Hour (TimeValue ({MASTER.Saat})),0))) & TOTEXT(Hour (TimeValue ({MASTER.Saat})),0) & ":45:00"
Actually, grouping works fine but my problem is that if there is no data in the table for a period, I can not show that in the report.
As an example;
Let my data has 5 records as following:
11:01:03
11:16:07
11:28:16
12:18:47
12:22:34
My report gives the result as following:
Period | Total Records
11:00:00 | 1
11:15:00 | 2
12:15:00 | 2
In this situation, I can not show the periods (which are missing in the table) as 0 for Total Records. I have to show as follows:
Period | Total Records
11:00:00 | 1
11:15:00 | 2
11:30:00 | 0
11:45:00 | 0
12:00:00 | 0
12:15:00 | 2
Thanks for all suggestions.
You can't group something that's not there. One way to solve this is to use a table that provides all of your intervals you want to look at (called a date, time or number table).
For your case create a table that contains all your period values (24x4). join the records you want to count to this table. In Crystal Reports group by the period values - your result set will contain the periods without any joined records - you can detect this and output a 0.
You may want to look a this question, it is similar to yours.

Calculate datediff based on the week of date instead of day of date

Currently my data looks like this -
I have 3 sets of comparisons - Daily, Weekly, and Monthly. When comparing weekly cohorts against other weekly cohorts and monthly cohorts against other monthly cohorts, you often run into the case where the cohort is not fully 'baked'. This means, for example, the first cohort day in each week (i.e. install_cohort == 2/21/2016 with Install Cohort Week == Feb 21th) has more days_since_install than install_cohort == 2/27/2016 (the last day in the week of Install Cohort Week == Feb 21th).
When making comparisons between weeks this means not everyone has moved through the same days_since_install.
The goal is to filter the data such that every cohort has the same days_since_install which would get rid of the additional days_since_install that install_cohort == 2/21/2016 has over install_cohort == 2/27/2016 for example. I only want to make comparisons where each week's collection of install_cohorts has the same number of days_since_install.
Ok after a long discussion,
Create a calculated field with {fixed [install_cohort_month]: max([install_cohort])} and call it [max_date]
create a calculated field datediff('day', [max_date],today())
That gives you a table similar to this
Month | week | install_cohort | max_date | days_since_install |
28/2/2016 | 7/2/2016 | 1/2/2016 | 28/2/2016 | 50 |
28/2/2016 | 28/2/2016 | 28/2/2016 | 28/2/2016 | 50 |
You can change the first calc to use the month field instead of the week field as well.

Median by Year in Tableau

I have longitudinal data for thirty companies, and I want to create a trend line of the median company by year. Currently, I have a separate row in the file for each year of the company. For example:
+-----------+------+-------+
| Name | Year | Value |
+-----------+------+-------+
| Company A | 2014 | 2000 |
| Company A | 2013 | 2500 |
| Company B | 2014 | 3000 |
| Company B | 2013 | 2900 |
+-----------+------+-------+
I am imagining a graph that has year on the X axis and the value on the Y axis, with the data point being the median of all companies in a given year. What is the best way to do this? I have tried a number of things and still have not had any success.
If I understand you correctly, you'd like to see a chart that shows the median value for each year as well as the company that had that median value. Here's one way:
Make a calculated field to find the median value per year (let's call it Median Value per Year):
{ FIXED [Year] : MEDIAN(Value) }
Next make a calculated field to find the company which had that median value (let's call it Median Company):
ATTR(
IF [Median Value per Year] = [Value]
THEN [Name]
ELSE NULL
END
)
That IF statement checks to see if the Value is equal to the median value we calculated earlier. If so, it returns the Name. Otherwise, NULL. When we take the ATTR() of that, it will ideally1 return the name of the company which had that median value.
Now you can place Year in the columns shelf, MEDIAN(Value) in the Rows shelf, and (for example) put AGG(Median Company) in Label.
1 If you have more than one company with that same value, then it will return "*".