Tableau - Calculating average where date is less than value from another data source - tableau-api

I am trying to calculate the average of a column in Tableau, except the problem is I am trying to use a single date value (based on filter) from another data source to only calculate the average where the exam date is <= the filtered date value from the other source.
Note: Parameters will not work for me here, since new date values are being added constantly to the set.
I have tried many different approaches, but the simplest was trying to use a calculated field that pulls in the filtered exam date from the other data source.
It successfully can pull the filtered date, but the formula does not work as expected. 2 versions of the calculation are below:
IF DATE(ATTR([Exam Date])) <= DATE(ATTR([Averages (Tableau Test Scores)].[Updated])) THEN AVG([Raw Score]) END
IF DATEDIFF('day', DATE(ATTR([Exam Date])), DATE(ATTR([Averages (Tableau Test Scores)].[Updated]))) > 1 THEN AVG([Raw Score]) END
Basically, I am looking for the equivalent of this in SQL Server:
SELECT AVG([Raw Score]) WHERE ExamDate <= (Filtered Exam Date)
Below a workbook that shows an example of what I am trying to accomplish. Currently it returns all blanks, likely due to the many-to-one comparison I am trying to use in my calculation.
Any feedback is greatly appreciated!
Tableau Test Exam Workbook

I was able to solve this by using Custom SQL to join the tables together and calculate the average based on my conditions, to get the column results I wanted.
Would still be great to have this ability directly in Tableau, but whatever gets the job done.
Edit:
SELECT
[AcademicYear]
,[Discipline]
--Get the number of student takers
,COUNT([Id]) AS [Students (N)]
--Get the average of the Raw Score
,CAST(AVG(RawScore) AS DECIMAL(10,2)) AS [School Mean]
--Get the number of failures based on an "adjusted score" column
,COUNT([AdjustedScore] < 70 THEN 1 END) AS [School Failures]
--This is the column used as the cutoff point for including scores
,[Average_Update].[Updated]
FROM [dbo].[Average] [Average]
FULL OUTER JOIN [dbo].[Average_Update] [Average_Update] ON ([Average_Update].[Id] = [Average].UpdateDateId)
--The meat of joining data for accurate calculations
FULL OUTER JOIN (
SELECT DISTINCT S.[Id], S.[LastName], S.[FirstName], S.[ExamDate], S.[RawScoreStandard], S.[RawScorePercent], S.[AdjustedScore], S.[Subject], P.[Id] AS PeriodId
FROM [StudentScore] S
FULL OUTER JOIN
(
--Get only the 1st attempt
SELECT DISTINCT [NBOMEId], S2.[Subject], MIN([ExamDate]) AS ExamDate
FROM [StudentScore] S2
GROUP BY [NBOMEId],S2.[Subject]
) B
ON S.[NBOMEId] = B.[NBOMEId] AND S.[Subject] = B.[Subject] AND S.[ExamDate] = B.[ExamDate]
--Group in "Exam Periods" based on the list of periods w/ start & end dates in another table.
FULL OUTER JOIN [ExamPeriod] P
ON S.[ExamDate] = P.PeriodStart AND S.[ExamDate] <= P.PeriodEnd
WHERE S.[Subject] = B.[Subject]
GROUP BY P.[Id], S.[Subject], S.[ExamDate], S.[RawScoreStandard], S.[RawScorePercent], S.[AdjustedScore], S.[NBOMEId], S.[NBOMELastName], S.[NBOMEFirstName], S.[SecondYrTake]) [StudentScore]
ON
([StudentScore].PeriodId = [Average_Update].ExamPeriodId
AND [StudentScore].Subject = [Average].Subject
AND [StudentScore].[ExamDate] <= [Average_Update].[Updated])
--End meat
--Joins to pull in relevant data for normalized tables
FULL OUTER JOIN [dbo].[Student] [Student] ON ([StudentScore].[NBOMEId] = [Student].[NBOMEId])
INNER JOIN [dbo].[ExamPeriod] [ExamPeriod] ON ([Average_Update].ExamPeriodId = [ExamPeriod].[Id])
INNER JOIN [dbo].[AcademicYear] [AcademicYear] ON ([ExamPeriod].[AcademicYearId] = [AcademicYear].[Id])
--This will pull only the latest update entry for every academic year.
WHERE [Updated] IN (
SELECT DISTINCT MAX([Updated]) AS MaxDate
FROM [Average_Update]
GROUP BY[ExamPeriodId])
GROUP BY [AcademicYear].[AcademicYearText], [Average].[Subject], [Average_Update].[Updated],
ORDER BY [AcademicYear].[AcademicYearText], [Average_Update].[Updated], [Average].[Subject]

I couldn't download your file to test with your data, but try reversing the order of taking the average ie
average(IF DATE(ATTR([Exam Date])) <= DATE(ATTR([Averages (Tableau Test Scores)].[Updated]) then [Raw Score]) END)
as written, I believe you'll be averaging the data before returning it from the if statement, whereas you want to return the data, then average it.

Related

Return closest timestamp from Table B based on timestamp from Table A with matching Product IDs

Goal: Create a query to pull the closest cycle count event (Table C) for a product ID based on the inventory adjustments results sourced from another table (Table A).
All records from Table A will be used, but is not guaranteed to have a match in Table C.
The ID column will be present in both tables, but is not unique in either, so that pair of IDs and Timestamps together are needed for each table.
Current simplified SQL
SELECT
A.WHENOCCURRED,
A.LPID,
A.ITEM,
A.ADJQTY,
C.WHENOCCURRED,
C.LPID,
C.LOCATION,
C.ITEM,
C.QUANTITY,
C.ENTQUANTITY
FROM
A
LEFT JOIN
C
ON A.LPID = C.LPID
WHERE
A.facility = 'FACID'
AND A.WHENOCCURRED > '23-DEC-22'
AND A.ADJREASONABBREV = 'CYCLE COUNTS'
ORDER BY A.WHENOCCURRED DESC
;
This is currently pulling the first hit on C.WHENOCCURRED on the LPID matches. Want to see if there is a simpler JOIN solution before going in a direction that creates 2 temp tables based on WHENOCCURRED.
I have a functioning INDEX(MATCH(MIN()) solution in Excel but that requires exporting a couple system reports first and is extremely slow with X,XXX row tables.
If you are using Oracle 12 or later, you can use a LATERAL join and FETCH FIRST ROW ONLY:
SELECT A.WHENOCCURRED,
A.LPID,
A.ITEM,
A.ADJQTY,
C.WHENOCCURRED,
C.LPID,
C.LOCATION,
C.ITEM,
C.QUANTITY,
C.ENTQUANTITY
FROM A
LEFT OUTER JOIN LATERAL (
SELECT *
FROM C
WHERE A.LPID = C.LPID
AND A.whenoccurred <= c.whenoccurred
ORDER BY c.whenoccurred
FETCH FIRST ROW ONLY
) C
ON (1 = 1) -- The join condition is inside the lateral join
WHERE A.facility = 'FACID'
AND A.WHENOCCURRED > DATE '2022-12-23'
AND A.ADJREASONABBREV = 'CYCLE COUNTS'
ORDER BY A.WHENOCCURRED DESC;

Unexpected Results From Postgresql Query

I am trying to find a way to create a table where it shows each customer profile (71 in total) their top item bought per time frame (10 in total), what that item is, and the time frame that was most popular. Whenever I run this query it shows the top items for a time frame but it shows all the customers as null. I also need a way to display the customer name which is also accessed through the id_table. I'm lost so any direction would be greatly appreciated! I only have read permissions on this DB.
select distinct id_table.name as product_name, pb.recruitment_round, count(pb.purchased), st.cust_dbf_id as cust_profile
from product_bought pb
join id_table
on id_table.dbf_id = pb.dbf_id
left join shopper_table st
on st.cust_dbf_id = id_table.dbf_id
where pb.date >= '2022-01-01'
and pb.date <= '2022-01-05'
and pb.shopping_time = 4
group by id_table."name", pb.recruitment_round, pb.cust_dbf_id
order by count(pb.purchased) desc, pb.recruitment_round
limit 1;
Expected: a return of st.cust_dbf_id.
Received: Null values

How to keep one record in specific column and make other record value 0 in group by clause in PostgreSQL?

I have a set of data like this
The Result should look Like this
My Query
SELECT max(pi.pi_serial) AS proforma_invoice_id,
max(mo.manufacturing_order_master_id) AS manufacturing_order_master_id,
max(pi.amount_in_local_currency) AS sales_value,
FROM proforma_invoice pi
JOIN schema_order_map som ON pi.pi_serial = som.pi_id
LEFT JOIN manufacturing_order_master mo ON som.mo_id = mo.manufacturing_order_master_id
WHERE to_date(pi.proforma_invoice_date, 'DD/MM/YYYY') BETWEEN to_date('01/03/2021', 'DD/MM/YYYY') AND to_date('19/04/2021', 'DD/MM/YYYY')
AND pi.pi_serial in (9221,
9299)
GROUP BY mo.manufacturing_order_master_id,
pi.pi_serial
ORDER BY pi.pi_serial
Option 1: Create a "Running Total" field in Crystal Reports to sum up only one "sales_value" per "proforma_invoice_id".
Option 2: Add a helper column to your Postgresql query like so:
case
when row_number()
over (partition by proforma_invoice_id
order by manufacturing_order_master_id)
= 1
then sales_value
else 0
end
as sales_value
I prepared this SQLFiddle with an example for you (and would of course like to encourage you to do the same for your next db query related question on SO, too :-)

How to query the first row efficiently?

I have a table with large amount of records:
date instrument price
2019.03.07 X 1.1
2019.03.07 X 1.0
2019.03.07 X 1.2
...
When I query for the day opening price, I use:
1 sublist select from prices where date = 2019.03.07, instrument = `X
It takes a long time to execute because it selects all the prices on that day and get the first one.
I also tried:
select from prices where date = 2019.03.07, instrument = `X, i = 0 //It does not return any record (why?)
select from prices where date = 2019.03.07, instrument = `X, i = first i //Seem to work. Does it?
In Oracle an equivalent will be:
select * from prices where date = to_date(...) and instrument = "X" and rownum = 1
and Oracle will stop immediately when it finds the first record.
How to do this in KDB (e.g. stop immediately after it finds the first record)?
In kdb, where subclauses in select statements are executed sequentially. i.e. only those records which pass the first "test" get passed to the second test. With that in mind, looking at your two attempts:
select from prices where date = 2019.03.07, instrument = `X, i = 0 //It does not return any record (why?)
This doesn't (necessarily) return anything, because by the time it gets to the i=0 check, you've already filtered out some records (possibly including the first record in the original table, which would have i=0)
select from prices where date = 2019.03.07, instrument = `X, i = first i //Seem to work. Does it?
This one should work. First you filter by date. Then within the records for that date, you select the records for instrument `X. Then within those records, you take the record where i is the first i (where i has already been filtered down, so first i is simply the index of the first record [still the index from the original table, not the filtered down version])
Q-SQL equivalent for that is select[n] which also performs better than other approaches in most of the cases. Positive 'n' will give first n records and negative will give last n records.
q) select[1] from prices where date = 2019.03.07, instrument = `X
There is no inbuilt functionality to stop after first match. You can write custom function for that but that would probably execute slower than above supported version.

TSQL select and join issue

I have two tables, EMPL which is a historical employee table to track changes in an employee's tax rate and PAYROLL which is also a historical table filled with employee pay over a number of periods.
FROM EMPL, based upon the EMPL.effect_pd <= PAYROLL.payroll_pd, only one record should be joined from EMPL to PAYROLL.
Below are the two tables, query and result set. However, I only want 1 record for each employee per pay period, which matches the relevant employee record based upon the payroll_pd and effect_pd.
(Click image to enlarge)
first of all - welcome!
You wrote "...FROM EMPL, based upon the EMPL.effect_pd <= PAYROLL.payroll_pd ..." but you start your SQL with PAYROLL and not with EMPL.
Pls test this statement first:
SELECT
E.rec_id
,E.empl_id
,E.empl_name
,E.tax_rate
,E.effect_pd
,P.rec_id
,P.payroll_pd
,P.empl_id
,P.pd_pay
FROM
empl AS E
LEFT OUTER JOIN
payroll AS P
ON E.empl_id = P.empl_id
AND E.effect_pd < P.payroll_pd
After that you get 7 records witch are uniqe.
i think, thats it.
Best regards
After 3 days of messing around with the code, I finally arrived at the solution which is:
SELECT * FROM PAYROLL p
LEFT JOIN EMPL e on p.empl_id = e.empl_id
WHERE e.rec_id = ( SELECT TOP 1 c.rec_id
FROM EMPL c
WHERE c.empl_id = p.empl_id
AND p.payroll_pd >= c.effect_pd
ORDER BY c.effect_pd DESC );