I have a very large table that contains 4 columns: 1) the status property of a member has changed to:
online, offline, game_lobby, load_screen 2) the status property of a member has changed from: online, offline, game_lobby, and load_screen 3) a member's ID number and 4)the timestamp of when the status property changed). I want to calculate the average time all members spend online, which would be the difference between the timestamp of when a state changes from online to offline and the timestamp of when a state changes from offline to online:
sample dataset
Using the sample linked above, the average calculated would be (01/03/2016 15:32:05 - 01/02/2016 07:18:32 + 03/14/2016 05:46:41 - 03/14/2016 04:09:04
)/2
Here's what I wrote, which gave me a few negative averages calculated for certain members, which can't be right:
with sessions as
( select
date_trunc('week', sc.occurred_at) as week,
sc.occurred_at,
sc.id,
timestampdiff(second,lag(sc.occurred_at) over (order by sc.id asc, sc.occurred_at),
sc.occurred_at)/3600 as session
from state_changes sc
where
((from_state = 'offline' and to_state = 'online') or
(from_state = 'offline' and to_state = 'online'))
and occurred_at at time zone 'America/New_york' > '2016-01-01'
)
select week, avg(session), id
from sessions
group by 1,3;
I can roll-up the averages into a single value instead of by member, but what I wrote is clearly wrong since a small number of the averages are returning negative. Does anyone have any suggestions?
You are basically interested in the time period between going from offline->online and then going back ?->offline. So the trick is to get only those records in a sub-query and then do the lag over those two. You have some problems with your code in exactly those two issues, see code below. In the main query you then get the average and throw out the offline->online row.
SELECT date_trunc('week', logout) AS week,
avg(extract(epoch from logout - login)), -- in seconds
id
FROM (
SELECT lag(occurred_at) OVER (PARTITION BY id ORDER BY occurred_at) AS login,
occurred_at AS logout,
id,
to_state
FROM state_change
WHERE (from_state = 'offline' or to_state = 'offline')
AND occurred_at > '2016-01-01') sub
WHERE to_state = 'offline'
GROUP BY 1,3;
Related
I am trying to write a SQL query where the results would show the first value (ID) per user per day for the last year.
I tried using the query below and am able to get results for one day but when I try to change the time range to > 2021-06-01, it does not give me the results I expect.
select * from table
where value in
(
SELECT min(value)
FROM table
WHERE valueid = x
group by user
) and Time = '2022-05-30' and value is not null
I am trying to find a way to create a table where it shows each customer profile (71 in total) their top item bought per time frame (10 in total), what that item is, and the time frame that was most popular. Whenever I run this query it shows the top items for a time frame but it shows all the customers as null. I also need a way to display the customer name which is also accessed through the id_table. I'm lost so any direction would be greatly appreciated! I only have read permissions on this DB.
select distinct id_table.name as product_name, pb.recruitment_round, count(pb.purchased), st.cust_dbf_id as cust_profile
from product_bought pb
join id_table
on id_table.dbf_id = pb.dbf_id
left join shopper_table st
on st.cust_dbf_id = id_table.dbf_id
where pb.date >= '2022-01-01'
and pb.date <= '2022-01-05'
and pb.shopping_time = 4
group by id_table."name", pb.recruitment_round, pb.cust_dbf_id
order by count(pb.purchased) desc, pb.recruitment_round
limit 1;
Expected: a return of st.cust_dbf_id.
Received: Null values
with this query:
SELECT date_trunc('minute', ts) ts, instrument
FROM test
GROUP BY date_trunc('minute', ts), instrument
ORDER BY ts
I am grouping rows by minutes but I would like to generate a boolean value that tells me if, in the group, there is at least one row with the timestamp where the seconds are < 10 and at least one row with the timestamp where the seconds are > 50.
In short, something like:
lessThan10 = false
moreThan50 = false
for each row in the one minute group:
if row.ts.seconds < 10 then lessThan10 = true
if row.ts.seconds > 50 then moreThan50 = true
return lessThan10 && moreThan50
What I am trying to achieve is to find out if all the records I aggregate cover the beginning and the end of the minute; it's ok if there are holes here and there, but it's possible the data we capture stops and restarts at second 40 for example and, in that case, I'd like to be able to discard the whole minute.
As the data rate varies quite a lot, I can't check for a minimum number of row. There may be a better solution to achieve this, so I'm open to it as well.
Use EXTRACT() to get the seconds of the min and max values of ts:
SELECT date_trunc('minute', ts) ts, instrument,
EXTRACT(SECOND FROM MIN(ts)) < 10 lessThan10,
EXTRACT(SECOND FROM MAX(ts)) > 50 moreThan50
FROM test
GROUP BY date_trunc('minute', ts), instrument
ORDER BY ts
See the demo.
I am trying to calculate the average of a column in Tableau, except the problem is I am trying to use a single date value (based on filter) from another data source to only calculate the average where the exam date is <= the filtered date value from the other source.
Note: Parameters will not work for me here, since new date values are being added constantly to the set.
I have tried many different approaches, but the simplest was trying to use a calculated field that pulls in the filtered exam date from the other data source.
It successfully can pull the filtered date, but the formula does not work as expected. 2 versions of the calculation are below:
IF DATE(ATTR([Exam Date])) <= DATE(ATTR([Averages (Tableau Test Scores)].[Updated])) THEN AVG([Raw Score]) END
IF DATEDIFF('day', DATE(ATTR([Exam Date])), DATE(ATTR([Averages (Tableau Test Scores)].[Updated]))) > 1 THEN AVG([Raw Score]) END
Basically, I am looking for the equivalent of this in SQL Server:
SELECT AVG([Raw Score]) WHERE ExamDate <= (Filtered Exam Date)
Below a workbook that shows an example of what I am trying to accomplish. Currently it returns all blanks, likely due to the many-to-one comparison I am trying to use in my calculation.
Any feedback is greatly appreciated!
Tableau Test Exam Workbook
I was able to solve this by using Custom SQL to join the tables together and calculate the average based on my conditions, to get the column results I wanted.
Would still be great to have this ability directly in Tableau, but whatever gets the job done.
Edit:
SELECT
[AcademicYear]
,[Discipline]
--Get the number of student takers
,COUNT([Id]) AS [Students (N)]
--Get the average of the Raw Score
,CAST(AVG(RawScore) AS DECIMAL(10,2)) AS [School Mean]
--Get the number of failures based on an "adjusted score" column
,COUNT([AdjustedScore] < 70 THEN 1 END) AS [School Failures]
--This is the column used as the cutoff point for including scores
,[Average_Update].[Updated]
FROM [dbo].[Average] [Average]
FULL OUTER JOIN [dbo].[Average_Update] [Average_Update] ON ([Average_Update].[Id] = [Average].UpdateDateId)
--The meat of joining data for accurate calculations
FULL OUTER JOIN (
SELECT DISTINCT S.[Id], S.[LastName], S.[FirstName], S.[ExamDate], S.[RawScoreStandard], S.[RawScorePercent], S.[AdjustedScore], S.[Subject], P.[Id] AS PeriodId
FROM [StudentScore] S
FULL OUTER JOIN
(
--Get only the 1st attempt
SELECT DISTINCT [NBOMEId], S2.[Subject], MIN([ExamDate]) AS ExamDate
FROM [StudentScore] S2
GROUP BY [NBOMEId],S2.[Subject]
) B
ON S.[NBOMEId] = B.[NBOMEId] AND S.[Subject] = B.[Subject] AND S.[ExamDate] = B.[ExamDate]
--Group in "Exam Periods" based on the list of periods w/ start & end dates in another table.
FULL OUTER JOIN [ExamPeriod] P
ON S.[ExamDate] = P.PeriodStart AND S.[ExamDate] <= P.PeriodEnd
WHERE S.[Subject] = B.[Subject]
GROUP BY P.[Id], S.[Subject], S.[ExamDate], S.[RawScoreStandard], S.[RawScorePercent], S.[AdjustedScore], S.[NBOMEId], S.[NBOMELastName], S.[NBOMEFirstName], S.[SecondYrTake]) [StudentScore]
ON
([StudentScore].PeriodId = [Average_Update].ExamPeriodId
AND [StudentScore].Subject = [Average].Subject
AND [StudentScore].[ExamDate] <= [Average_Update].[Updated])
--End meat
--Joins to pull in relevant data for normalized tables
FULL OUTER JOIN [dbo].[Student] [Student] ON ([StudentScore].[NBOMEId] = [Student].[NBOMEId])
INNER JOIN [dbo].[ExamPeriod] [ExamPeriod] ON ([Average_Update].ExamPeriodId = [ExamPeriod].[Id])
INNER JOIN [dbo].[AcademicYear] [AcademicYear] ON ([ExamPeriod].[AcademicYearId] = [AcademicYear].[Id])
--This will pull only the latest update entry for every academic year.
WHERE [Updated] IN (
SELECT DISTINCT MAX([Updated]) AS MaxDate
FROM [Average_Update]
GROUP BY[ExamPeriodId])
GROUP BY [AcademicYear].[AcademicYearText], [Average].[Subject], [Average_Update].[Updated],
ORDER BY [AcademicYear].[AcademicYearText], [Average_Update].[Updated], [Average].[Subject]
I couldn't download your file to test with your data, but try reversing the order of taking the average ie
average(IF DATE(ATTR([Exam Date])) <= DATE(ATTR([Averages (Tableau Test Scores)].[Updated]) then [Raw Score]) END)
as written, I believe you'll be averaging the data before returning it from the if statement, whereas you want to return the data, then average it.
I have to solve a problem and don't know how to do it. Im using SQL Server 2012.
I have the data like this schema:
-----------------------------------------------------------------------------------
DriverId | BeginDate | EndDate | NextBegin | Rest in | Drive Time | Drive
| | | Date | Hours | in Minutes | KM
-----------------------------------------------------------------------------------
integer datetime datetime datetime integer integer decimal(10,3)
Rest in hours = EndDate - NextBeginDate
Drive Time in Minutes = BeginDate - EndDate
I have to search the first rest => 36 hours then
Do
Compute how many days are
SUM(DriveTime)
SUM(TotalKM)
until next rest => 36 hours
IF No More Rest EXIT DO
Loop
From the begining to the first Rest is discard
From the last Rest to the end is discard
I have data in excel sheet you can download from here: Download Excel with data example
I'm sorry for my english, I hope you can understand and help me, thank you in advance.
There are several parts to the query. The first part pulls out the rows where Rest is >= 36 and assigns a row number. The result is stored in a CTE called BigRest.
with BigRest(RowNumber, DriverId, BeginDate, EndDate)
as
(
select ROW_NUMBER() over(partition by d.DriverId order by d.DriverId, d.BeginDate) RowNumber,d.DriverId, d.BeginDate, d.EndDate
from Drive d
where d.Rest >= 36
)
Then I assign the row number from BigRest to each row in Drive (which is what I'm calling the table that has all the data in it) based on the BeginDate. So the data is effectively segmented by the days where Rest >= 36. Each segment gets a number called DriveGroup.
;with Grouped(DriverId, BeginDate, EndDate, DriveTime, DriveKM, DriveGroup)
as
(
select d.DriverId, d.BeginDate, d.EndDate, d.Drivetime, d.DriveKM, (select Top 1 RowNumber from BigRest b where b.DriverId = d.DriverId and b.BeginDate >= d.BeginDate order by b.BeginDate)
from Drive d
)
Finally, I select the data from Grouped, cross applying it with some aggregate data from itself. We can filter out the rows where the DriveGroup is 1 or null because those represent the beginning and end rows that don't matter (the "do nothing" rows).
select distinct DriverId, MinBeginDate BeginDate, MaxEndDate EndDate, DATEDIFF(D, MinBeginDate, MaxEndDate)+1 Days, DriveTimeSum Drive, DriveKMSum KM
from
(
select g.DriverId, g.BeginDate, g.EndDate, g.DriveGroup, g.DriveTime, c.DriveTimeSum, c.DriveKMSum, c.MinBeginDate, c.MaxEndDate
from Grouped g
cross apply(select SUM(g2.DriveTime) DriveTimeSum,
SUM(g2.DriveKM) DriveKMSum,
MIN(g2.BeginDate) MinBeginDate,
MAX(g2.EndDate) MaxEndDate
from Grouped g2
where g2.DriverId = g.DriverId
and g2.DriveGroup = g.DriveGroup) as c
where g.DriveGroup is not null
and g.DriveGroup > 1
) x
Here's a SQL Fiddle
I'd encourage you to look at the results at each step of the query to see what's actually going on.