PowerBI Natural Left outer join issues with deleted rows in right table - left-join

I have two tables.
1) Table 1 : 1 column with date value
2) Table 2 : 2 columns : Date column + business value column
I am trying to use DAX in PowerBI to create a new table using a left outer join to fill missing dates in my second table.
First table :
| Date |
| 2015-05-01 |
| 2015-06-01 |
| 2015-07-01 |
| 2015-08-01 |
Second table :
| Date | Value |
| -----------|--------- |
| 2015-05-01 | 5 |
| 2015-05-01 | 5 |
| 2015-06-01 | 6 |
| 2015-07-01 | 7 |
DAX code to create new table :
Table =
var table4=
SELECTCOLUMNS(Table1, "Date", Table1[Date]&"")
var table5=
SELECTCOLUMNS(Table2,"value", Table2[value],"Date", Table2[Date]&"")
return
NATURALLEFTOUTERJOIN(table4,table5)
This is returning :
| Date | Value |
| -----------|--------- |
| 2015-05-01 | 5 |
| 2015-06-01 | 6 |
| 2015-07-01 | 7 |
| 2015-08-01 | NA|
But I want:
| Date | Value |
| -----------|--------- |
| 2015-05-01 | 5 |
| 2015-05-01 | 5 |
| 2015-06-01 | 6 |
| 2015-07-01 | 7 |
| 2015-08-01 | NA |
I am not sure why it is removing the second value of
| 2015-05-01 5|
I need the two values for the month of may to remain in the table.
Any ideas ? thanks a lot

I have contacted Microsoft development team in August, 2019 (via Marco Russo at SQLBI); they confirmed that this behavior was caused by a bug, and promised to fix it in the upcoming releases.
I have tested November, 2019 release of Power BI Desktop and confirm that the bug is indeed fixed.
My test code:
T1 = DATATABLE("Date", INTEGER, {{1}, {2}, {3}, {4}})
T2 = DATATABLE( "Date", INTEGER, "Value", INTEGER, {{1, 5}, {1,5}, {2, 6}, {3, 7}})
Test =
VAR T3 = SELECTCOLUMNS(T1, "Date", T1[Date]*1)
VAR T4 = SELECTCOLUMNS(T2, "Date", T2[Date]*1, "Value", T2[Value])
RETURN
NATURALLEFTOUTERJOIN(T3, T4)
Results:

As documented: https://learn.microsoft.com/en-us/dax/naturalinnerjoin-function-dax
It joins on the natural columns the rows which matches both and then adds extra columns.
You need to ask yourself why you need this row in your data because when your model is correct, Power-Bi will do the work for you. In your situation you would generate one extra row for each date where a value does not exist, do you need this?
I created your use case and made the correct model:
next I created a table with dates and values and selected on Date: Select values with no date:
It is just how you want to present your data.. I do a summary over the values which fall on the same date..

Related

Flatten Postgers left join query result with dynamic values into one row

I have two tables products and product_attributs. One Product can have one or many attributs and these are filled by a dynamic web form (name and value inputs) added by the user as needed. For example for a drill the user could decide to add two attributs : color=blue and power=100 watts. For another product it could be 3 or more different attribus and for another it could have no special attributs.
products
| id | name | identifier | identifier_type | active
| ----------|--------------|-------------|------------------|---
| 1 | Drill | AD44 | barcode | true
| 2 | Polisher | AP211C | barcode | true
| 3 | Jackhammer | AJ2133 | barcode | false
| 4 | Screwdriver | AS4778 | RFID | true
product_attributs
|id | name | value | product_id
|----------|--------------|-------------|----------
|1 | color | blue | 1
|2 | power | 100 watts | 1
|3 | size | 40 cm | 2
|4 | energy | electrical | 3
|4 | price | 35€ | 3
so attributs could be anything which are set dynamically by the user. My need is to generate a report on CSV which contain all products with their attributs. Without a good experience in SQL I generated the following basic request :
SELECT pr.name, pr.identifier_type, pr.identifier, pr.active, att.name, att.value
FROM products as pr
LEFT JOIN product_attributs att ON pr.id = att.product_id
as you know the result will contain for the same product as many rows as attributs it has and this is not ideal for reporting. The ideal would be this :
|name | identifier_type | identifier | active | name | value | name | value
|-----------|-----------------|------------|--------|--------|-------|------ |------
|Drill | barcode | AD44 | true | color | blue | power | 100 w
|Polisher | barcode | AP211C | true | size | 40 cm | null | null
|Jackhammer | barcode | AJ2133 | true | energy | elect | price | 35 €
|Screwdriver| barcode | AS4778 | true | null | null | null | null
here I only showed a max of two attributes per product but it could be more if needed. Well I did some research and came across the pivot with crosstab function on Postgres but the problem it requests static values but this does not match my need.
thanks lot for your help and sorry for duplicates if any.
Thanks Laurenz Albe for your help. array_agg solved my problem. Here is the query if someone may be interested in :
SELECT
pr.name, pr.description, pr.identifier_type, pr.identifier,
pr.internal_identifier, pr.active,
ARRAY_TO_STRING(ARRAY_AGG (oa.name || ' = ' || oa.value),', ') attributs
FROM
products pr
LEFT JOIN product_attributs oa ON pr.id = oa.product_id
GROUP BY
pr.name, pr.description, pr.identifier_type, pr.identifier,
pr.internal_identifier, pr.active
ORDER BY
pr.name;

How to get non-aggregated measures?

I calculate my metrics with SQL and publish the resulting table to Tableau Server. Afterward, use this data source to create charts and dashboards.
For one analysis, I already calculated the measures per day with SQL. When I use the resulting table in Tableau, it aggregates these measures to SUM by default. However, I don't want to have SUM or AVG of the average or SUM of the Percentiles.
What I want is the result when I don't select date dimension and not GROUP BY date in SQL as attached below.
Here is the query:
SELECT
-- date,
COUNT(DISTINCT id) AS count_of_id,
AVG(timediff_in_sec) AS avg_timediff,
PERCENTILE_CONT(0.25) WITHIN GROUP(ORDER BY timediff_in_sec) AS percentile_25,
PERCENTILE_CONT(0.50) WITHIN GROUP(ORDER BY timediff_in_sec) AS percentile_50
FROM
(
--subquery
) AS t1
-- GROUP BY date
Here are the first 10 rows of the resulting table:
+------------+--------------+-------------+---------------+---------------+
| date | avg_timediff | count_of_id | percentile_25 | percentile_50 |
+------------+--------------+-------------+---------------+---------------+
| 10/06/2020 | 61,65186364 | 22 | 8,5765 | 13,3015 |
| 11/06/2020 | 127,2913333 | 3 | 15,6045 | 17,494 |
| 12/06/2020 | 306,0348214 | 28 | 12,2565 | 17,629 |
| 13/06/2020 | 13,2664 | 5 | 11,944 | 13,862 |
| 14/06/2020 | 16,728 | 7 | 14,021 | 17,187 |
| 15/06/2020 | 398,6424595 | 37 | 11,893 | 19,271 |
| 16/06/2020 | 293,6925152 | 33 | 12,527 | 17,134 |
| 17/06/2020 | 155,6554286 | 21 | 13,452 | 16,715 |
| 18/06/2020 | 383,8101429 | 7 | 266,048 | 493,722 |
+------------+--------------+-------------+---------------+---------------+
How can I achieve the desired output above?
Drag them all into the dimensions list, then they will be static dimensions. For your use you could also just drag the Date field to Rows. Aggregating 1 value, which you have for each date, returns the same value whatever the aggregation type.

Tableau - Calculated field for difference between date and maximum date in table

I have the following table that I have loaded in Tableau (It has only one column CreatedOnDate)
+-----------------+
| CreatedOnDate |
+-----------------+
| 1/1/2016 |
| 1/2/2016 |
| 1/3/2016 |
| 1/4/2016 |
| 1/5/2016 |
| 1/6/2016 |
| 1/7/2016 |
| 1/8/2016 |
| 1/9/2016 |
| 1/10/2016 |
| 1/11/2016 |
| 1/12/2016 |
| 1/13/2016 |
| 1/14/2016 |
+-----------------+
I want to be able to find the maximum date in the table, compare it with every date in the table and get the difference in days. For the above table, the maximum date in table is 1/14/2016. Every date is compared to 1/14/2016 to find the difference.
Expected Output
+-----------------+------------+
| CreatedOnDate | Difference |
+-----------------+------------+
| 1/1/2016 | 13 |
| 1/2/2016 | 12 |
| 1/3/2016 | 11 |
| 1/4/2016 | 10 |
| 1/5/2016 | 9 |
| 1/6/2016 | 8 |
| 1/7/2016 | 7 |
| 1/8/2016 | 6 |
| 1/9/2016 | 5 |
| 1/10/2016 | 4 |
| 1/11/2016 | 3 |
| 1/12/2016 | 2 |
| 1/13/2016 | 1 |
| 1/14/2016 | 0 |
+-----------------+------------+
My goal is to create this Difference calculated field. I am struggling to find a way to do this using DATEDIFF.
And help would be appreciated!!
woodhead92, this approach would work, but means you have to use table calculations. Much more flexible approach (available since v8) is Level of Details expressions:
First, define a MAX date for the whole dataset with this calculated field called MaxDate LOD:
{FIXED : MAX(CreatedOnDate) }
This will always calculate the maximum date on table (will overwrite filters as well, if you need to reflect them, make sure you add them to context.
Then you can use pretty much the same calculated field, but no need for ATTR or Table Calculations:
DATEDIFF('day', [CreatedOnDate], [MaxDate LOD])
Hope this helps!

Join column with timestamps where value is maximum

I have a table that looks like
+-------+-----------+
| value | timestamp |
+-------+-----------+
and I'm trying to build a query that gives a result like
+-------+-----------+------------+------------------------+
| value | timestamp | MAX(value) | timestamp of max value |
+-------+-----------+------------+------------------------+
so that the result looks like
+---+----------+---+----------+
| 1 | 1.2.1001 | 3 | 1.1.1000 |
| 2 | 5.5.1021 | 3 | 1.1.1000 |
| 3 | 1.1.1000 | 3 | 1.1.1000 |
+---+----------+---+----------+
but I got stuck on joining the column with the corresponding timestamps.
Any hints or suggestions?
Thanks in advance!
For further information (if that helps):
In the real project the max-values are grouped by month and day (with group by clause, which works btw), but somehow I got stuck on joining the timestamps for max-values.
EDIT
Cross joins are a good idea, but I want to have them grouped by month e.g.:
+---+----------+---+----------+
| 1 | 1.1.1101 | 6 | 1.1.1300 |
| 2 | 2.6.1021 | 5 | 5.6.1000 |
| 3 | 1.1.1200 | 6 | 1.1.1300 |
| 4 | 1.1.1040 | 6 | 1.1.1300 |
| 5 | 5.6.1000 | 5 | 5.6.1000 |
| 6 | 1.1.1300 | 6 | 1.1.1300 |
+---+----------+---+----------+
EDIT 2
I've added a fiddle for some sample data and and example of the current query.
http://sqlfiddle.com/#!1/efa42/1
How to add the corresponding timestamp to the maximum?
Try a cross join with two sub queries, the first one selects all records, the second one gets one row that represents the time_stamp of the max value, <3;"1000-01-01"> for example.
SELECT col_value,col_timestamp,max_col_value, col_timestamp_of_max_value FROM table1
cross join
(
select max(col_value) max_col_value ,col_timestamp col_timestamp_of_max_value from table1
group by col_timestamp
order by max_col_value desc
limit 1
) A --One row that represents the time_stamp of the max value, ie: <3;"1000-01-01">
Use the window cause you use with pg
Select *, max( value ) over (), max( timestamp ) over() from table
That gives you the max values from all values in every row
http://www.postgresql.org/docs/9.1/static/tutorial-window.html

Finding the last seven days in a time series

I have a spreadsheet with column A which holds a timestamp and updates daily. Column B holds a value. Like the following:
+--------------------+---------+
| 11/24/2012 1:14:21 | $487.20 |
| 11/25/2012 1:14:03 | $487.20 |
| 11/26/2012 1:14:14 | $487.20 |
| 11/27/2012 1:14:05 | $487.20 |
| 11/28/2012 1:13:56 | $487.20 |
| 11/29/2012 1:13:57 | $487.20 |
| 11/30/2012 1:13:53 | $487.20 |
| 12/1/2012 1:13:54 | $492.60 |
+--------------------+---------+
What I am trying to do is get the average of the last 7, 14, 30 days.
I been playing with GoogleClock() function in order to filter the dates in column A but I can't seem to find the way to subtract TODAY - 7 days. I suspect FILTER will also help, but I am a little bit lost.
There are a few ways to go about this; one way is to return an array of values with a QUERY function (this assumes a header row in row 1, and you want the last 7 dates):
=QUERY(A2:B;"select B order by A desc limit 7";0)
and you can wrap this in whatever aggregation function you like:
=AVERAGE(QUERY(A2:B;"select B order by A desc limit 7";0))