QlikSense set analysis with 2 conditions on the same dimensions - qliksense

My need is to display the numbers of customers who returned an order in a month "M" and placed a new order on the next 6 months after the month "M". And that has to be displayed in a bar chart in QlikSense.
I created a first measure with :
count(distinct [Account id])
then I added my condition on my order returned :
count(distinct ${<[Transactions Type] = {"RETURN"}>} [Account id])
So with that I get the numbers of customers who returned an order by month M ("Order Creation YearMonth").
I created a second measure with :
count(distinct [Account id])
then I added my condition on my order on the next 6 months :
Rangesum(above(count(distinct ${<[Transactions Type] = {"ORDER"}>} [Account id]), 0, 6))
So with that I get the numbers of customers who placed an order on the 6 next months after the month M ("Order Creation YearMonth").
In abscissa, I put my dimension "Order Creation YearMonth".
My problem is that I don't know how to merge these 2 measures in a set analyzis.
Do I have to use the p operator ?
Or the * operator to combine the 2 conditions on the "Transactions Type" ? But how can I add the rangesum ? Or maybe it is the wrong way ...
Thanks to any of you who can give me some tips to move forward on this crazy problem, my brain thanks you ! :)

Related

Calculating total revenue lost in returns with Superstore dataset

I am attempting to create a parameter that allows me to custom filter for Sales, Units, Profit, Orders, Returns, and Return Units. I'm running into an issue when creating the calculated field that calculates return $ and return units. The result I am getting doesn't seem to be accurate.
I have created a Parameter with the above list called [Choose KPI].
I thought I would be able to create calculated fields for [Returned $] and [Returned Units]:
IF [Returned] = 'Yes' THEN [Sales] END
and
IF [Returned] = 'Yes' THEN [Quantity] END
Which would make my calculated field for [Choose KPI]:
CASE [Parameters].[Choose KPI]
WHEN "Sales" THEN SUM([Sales])
WHEN "Units" THEN SUM([Quantity])
WHEN "Profit" THEN SUM([Profit])
WHEN "Orders" THEN COUNT([Order ID])
WHEN "Returns" THEN SUM([Returns $])
WHEN "Return Units" THEN SUM([Returned Units])
END
Rather than returns being associated with product ID, they appear to be linked only with the order ID, which creates duplicate values when there were multiple items on an order (are we assuming all items in the order were returned when [Returned] = 'Yes'?), causing the sum of [Returns $] to be inflated.
How can I create a filter that uses the distinct Order ID to calculate total returns and returned units?
In Tableau, you can use the "Group" option in the "Analysis" menu to create a new field that groups your data by the "Order ID" field. Then, you can use the "COUNTD" function to count the number of unique "Order ID" values, which will give you the total number of returned orders.
To calculate the total returned units, you can create a calculated field that multiplies the quantity of each returned item by the number of unique items returned per order. You can use the COUNTD([Order ID]) and SUM(IF [Returned] = 'Yes' THEN [Quantity] END) in the calculation.
In the [Choose KPI] field, you can use the following calculation:
CASE [Parameters].[Choose KPI]
WHEN "Sales" THEN SUM([Sales])
WHEN "Units" THEN SUM([Quantity])
WHEN "Profit" THEN SUM([Profit])
WHEN "Orders" THEN COUNTD([Order ID])
WHEN "Returns" THEN SUM(IF [Returned] = 'Yes' THEN [Sales] END)
WHEN "Return Units" THEN SUM(IF [Returned] = 'Yes' THEN [Quantity] END) * COUNTD([Order ID])
END
This way, you will be able to filter your data based on the different KPIs you've defined, and the calculations for returned sales and units will be based on the distinct Order ID values.
(Answer by https://chat.openai.com/chat and formatted by me.)

Selecting max value grouped by specific column

Focused DB tables:
Task:
For given location ID and culture ID, get max(crop_yield.value) * culture_price.price (let's call this multiplication monetaryGain) grouped by year, so something like:
[
{
"year":2014,
"monetaryGain":...
},
{
"year":2015,
"monetaryGain":...
},
{
"year":2016,
"monetaryGain":...
},
...
]
Attempt:
SELECT cp.price * max(cy.value) AS monetaryGain, EXTRACT(YEAR FROM cy.date) AS year
FROM culture_price AS cp
JOIN culture AS c ON cp.id_culture = c.id
JOIN crop_yield AS cy ON cy.id_culture = c.id
WHERE c.id = :cultureId AND cy.id_location = :locationId AND cp.year = year
GROUP BY year
ORDER BY year
The problem:
"columns "cp.price", "cy.value" and "cy.date" must appear in the GROUP BY clause or be used in an aggregate function"
If I put these three columns in GROUP BY, I won't get expected result - It won't be grouped just by year obviously.
Does anyone have an idea on how to fix/write this query better in order to get task result?
Thanks in advance!
The fix
Rewrite monetaryGain to be:
max(cp.price * cy.value) AS monetaryGain
That way you will not be required to group by cp.price because it is not outputted as an group member, but used in aggregate.
Why?
When you write GROUP BY query you can output only columns that are in GROUP BY list and aggregate function values. Well this is expected - you expect single row per group, but you may have several distinct values for the field that is not in grouping column list.
For the same reason you can not use a non grouping column(-s) in arithmetic or any other (not aggregate) function because this would lead in several results for in single row - there would not be a way to display.
This is VERY loose explanation but I hope will help to grasp the concept.
Aliases in GROUP BY
Also you should not use aliases in GROUP BY. Use:
GROUP BY EXTRACT(YEAR FROM cy.date)
Using alias in GROUP BY is not allowed. This link might explain why: https://www.postgresql.org/message-id/7608.1259177709%40sss.pgh.pa.us

Getting top 1 value for each month

this is the code:
select date_part('month',inspection.idate) as _month, inspector.iname, count(inspector.iname) as num
from inspector,inspection
where inspection.idate>='2021/1/1' and inspector.iid = inspection.iid
group by inspector.iname, _month
order by _month
and this is the result:
enter image description here
need to show top 1 count for each month, and for month number 6 there are 2 with same count, need to show both.
You can use Ranking Functions to resolve this. May be DenseRank or Rank.

Tableau - Calculating average where date is less than value from another data source

I am trying to calculate the average of a column in Tableau, except the problem is I am trying to use a single date value (based on filter) from another data source to only calculate the average where the exam date is <= the filtered date value from the other source.
Note: Parameters will not work for me here, since new date values are being added constantly to the set.
I have tried many different approaches, but the simplest was trying to use a calculated field that pulls in the filtered exam date from the other data source.
It successfully can pull the filtered date, but the formula does not work as expected. 2 versions of the calculation are below:
IF DATE(ATTR([Exam Date])) <= DATE(ATTR([Averages (Tableau Test Scores)].[Updated])) THEN AVG([Raw Score]) END
IF DATEDIFF('day', DATE(ATTR([Exam Date])), DATE(ATTR([Averages (Tableau Test Scores)].[Updated]))) > 1 THEN AVG([Raw Score]) END
Basically, I am looking for the equivalent of this in SQL Server:
SELECT AVG([Raw Score]) WHERE ExamDate <= (Filtered Exam Date)
Below a workbook that shows an example of what I am trying to accomplish. Currently it returns all blanks, likely due to the many-to-one comparison I am trying to use in my calculation.
Any feedback is greatly appreciated!
Tableau Test Exam Workbook
I was able to solve this by using Custom SQL to join the tables together and calculate the average based on my conditions, to get the column results I wanted.
Would still be great to have this ability directly in Tableau, but whatever gets the job done.
Edit:
SELECT
[AcademicYear]
,[Discipline]
--Get the number of student takers
,COUNT([Id]) AS [Students (N)]
--Get the average of the Raw Score
,CAST(AVG(RawScore) AS DECIMAL(10,2)) AS [School Mean]
--Get the number of failures based on an "adjusted score" column
,COUNT([AdjustedScore] < 70 THEN 1 END) AS [School Failures]
--This is the column used as the cutoff point for including scores
,[Average_Update].[Updated]
FROM [dbo].[Average] [Average]
FULL OUTER JOIN [dbo].[Average_Update] [Average_Update] ON ([Average_Update].[Id] = [Average].UpdateDateId)
--The meat of joining data for accurate calculations
FULL OUTER JOIN (
SELECT DISTINCT S.[Id], S.[LastName], S.[FirstName], S.[ExamDate], S.[RawScoreStandard], S.[RawScorePercent], S.[AdjustedScore], S.[Subject], P.[Id] AS PeriodId
FROM [StudentScore] S
FULL OUTER JOIN
(
--Get only the 1st attempt
SELECT DISTINCT [NBOMEId], S2.[Subject], MIN([ExamDate]) AS ExamDate
FROM [StudentScore] S2
GROUP BY [NBOMEId],S2.[Subject]
) B
ON S.[NBOMEId] = B.[NBOMEId] AND S.[Subject] = B.[Subject] AND S.[ExamDate] = B.[ExamDate]
--Group in "Exam Periods" based on the list of periods w/ start & end dates in another table.
FULL OUTER JOIN [ExamPeriod] P
ON S.[ExamDate] = P.PeriodStart AND S.[ExamDate] <= P.PeriodEnd
WHERE S.[Subject] = B.[Subject]
GROUP BY P.[Id], S.[Subject], S.[ExamDate], S.[RawScoreStandard], S.[RawScorePercent], S.[AdjustedScore], S.[NBOMEId], S.[NBOMELastName], S.[NBOMEFirstName], S.[SecondYrTake]) [StudentScore]
ON
([StudentScore].PeriodId = [Average_Update].ExamPeriodId
AND [StudentScore].Subject = [Average].Subject
AND [StudentScore].[ExamDate] <= [Average_Update].[Updated])
--End meat
--Joins to pull in relevant data for normalized tables
FULL OUTER JOIN [dbo].[Student] [Student] ON ([StudentScore].[NBOMEId] = [Student].[NBOMEId])
INNER JOIN [dbo].[ExamPeriod] [ExamPeriod] ON ([Average_Update].ExamPeriodId = [ExamPeriod].[Id])
INNER JOIN [dbo].[AcademicYear] [AcademicYear] ON ([ExamPeriod].[AcademicYearId] = [AcademicYear].[Id])
--This will pull only the latest update entry for every academic year.
WHERE [Updated] IN (
SELECT DISTINCT MAX([Updated]) AS MaxDate
FROM [Average_Update]
GROUP BY[ExamPeriodId])
GROUP BY [AcademicYear].[AcademicYearText], [Average].[Subject], [Average_Update].[Updated],
ORDER BY [AcademicYear].[AcademicYearText], [Average_Update].[Updated], [Average].[Subject]
I couldn't download your file to test with your data, but try reversing the order of taking the average ie
average(IF DATE(ATTR([Exam Date])) <= DATE(ATTR([Averages (Tableau Test Scores)].[Updated]) then [Raw Score]) END)
as written, I believe you'll be averaging the data before returning it from the if statement, whereas you want to return the data, then average it.

TSQL Cursor Alternative to Speed up my query

Row Status Time
1 Status1 1383264075
2 Status1 1383264195
3 Status1 1383264315
4 Status2 1383264435
5 Status2 1383264555
6 Status2 1383264675
7 Status2 1383264795
8 Status1 1383264915
9 Status3 1383265035
10 Status3 1383265155
11 Status2 1383265275
12 Status3 1383265395
13 Status1 1383265515
14 Status1 1383265535
15 Status2 1383265615
The [Time] column holds POSIX time
I want to be able to calculate the number of seconds a given [Status] is active for within a given time period without using CURSORS. If this is the only then that is fine as I've already done that.
Using the above sample data extract, how do I calculate how long "Status1" has been active for?
That is, Substract Row1.[Time] from Row4.[Time], Substract Row8.[Time] from Row9.[Time], Substract Row13.[Time] from Row15.[Time].
Thankyou in advance
Assuming that each row represents that the specific Status is active from the specified Time until the next row, one would have to somehow calculate the difference between row N and N+1. One way would be to use a nested query (try it here: SQL Fiddle).
SELECT SUM(Duration) as Duration
FROM (
SELECT f.Status, s.Time-f.Time as Duration
FROM Table1 f
JOIN Table1 s on s.Row = f.Row+1
WHERE f.Status = 'Status1') a
The solution by #erikxiv will work if the Row values have no gaps. If they do have gaps, you could try the following method:
SELECT
TotalDuration = SUM(next.Time - curr.Time)
FROM
dbo.atable AS curr
CROSS APPLY
(
SELECT TOP (1) Time
FROM dbo.atable
WHERE Row > curr.Row
ORDER BY Row ASC
) AS next
WHERE
curr.Status = 'Status1'
;
For every row matching the specified status, the correlated subquery in the CROSS APPLY clause will fetch the next Time value based on the ascending order of Row. The current row's time is then subtracted from the next row's time and all the differences are added up using SUM().
Please note that in both solutions it is implied that the order of Row values follows the order of Time values. In other words, ORDER BY Row is assumed to be equivalent to ORDER BY Time or, if Time can have duplicates, to ORDER BY Time, Row.