Tableau date math with multiple dates - tableau-api

I have a table of policies with account number, effective, cancellation & expiration dates. I want to know how long an account had active policies for
Raw Table
id, account_id, effective_date, cancellation_date, expiration_date
1, a, 2020-01-01, null, 2020-06-01
2, b, 2020-02-01, null, 2020-07-01
3, b, 2020-03-01, null, 2020-08-01
4, a, 2020-04-01, null, 2020-09-01
5, b, 2020-04-01, 2020-08-15, 2020-09-01
Ideal output
account_id, active_date, inactive_date, active_time
a, 2020-01-01, 2020-09-01, 9 months
b, 2020-02-01, 2020-08-15, 7 months 15 days
So far I have made a table that has account_id as the left hand column. And then I have
MIN(effective_date) to get me the active date of the first policy.
Then I have Policy_Inactive_Date = MIN(cancellation_date, expiration_date). But that gives me the time the first policy was expired or cancelled.
It feels like I need to do MAX(Policy_Inactive_Date) but that throws and error.
I'm wondering if at first I need to get Policy_Inactive_Date at the policy level, and then get the max at the account level.

Do it like that
active_dt field like this
{
Fixed [account_id]: MIN([effective_date])
}
inactive_dt like this
{
Fixed [account_id]:MAX(IF ISNULL(MIN([cancellation_date],[expiration_date])) then [expiration_date] else MIN([cancellation_date],[expiration_date]) END)
}

Try MAX(MIN(cancellation_date, expiration_date))
There are two forms of MIN() -- and two forms of MAX(). With one argument, MIN() is an aggregation function, returning the least non-null value of that argument for a set of records. With two arguments, MIN() evaluates each argument and returns the smallest of those two values.

Related

Adding Row Number for Specific Field

I have the data below in SQL.
DOB Status Policy StartDate EndDate
1/05/1983 Lapsed P1 5/05/2015 5/06/2016
1/05/1983 New Business P2 3/05/2016
2/04/1999 Lapsed P3 5/02/2016 10/06/2017
2/04/1999 New Business P4 10/07/2017
3/06/1972 Lapsed P5 6/07/2016 15/12/2017
3/06/1972 New Business P6 1/10/2017
4/12/1954 Lapsed P7 7/03/2017 1/03/2018
4/12/1954 New Business P8 1/03/2018
I need to add descending number based on DOB field. The expected result suppose to be like below.
Unfortunately I only can get number '1' in column #.
For column #, I have tried using index(), Window_Count (Countd(DOB), 04,0), Runnning_Total (Table Down, Pane Down, Specific Dimension : DOB), however nothing works.
I'm using Tableau desktop/server 10.0.
Thanks all for the help.
Use RANK_DENSE function:
"Returns the dense rank for the current row in the partition. Identical values are assigned an identical rank, but no gaps are inserted into the number sequence. Use the optional 'asc' | 'desc' argument to specify ascending or descending order. The default is descending.
With this function, the set of values (6, 9, 9, 14) would be ranked (3, 2, 2, 1)."
RANK_DENSE(SUM(FLOAT([DOB])),'asc')

Postgres Function: how to return the first full set of data that occurs after specified date/time

I have a requirement to extract rows of data, but only if all said rows make a full set. We have a sequence table that is updated every minute, with data for 80 bins. We need to know the status of bins 1 thru 80 every minute as part of our production process.
I am generating a new report (postgres function) that needs to take a snapshot at roughly 00:01:00:AM (IE 1 minute past midnight). Initially I thougtht this to be an easy task, just grab the first 80 rows of data that occur at/after this time, however I see that, depending on network activity and industrial computer priorities, the table is not religiously updated at exactly 00:01:00AM or any minute for that matter. Updates can occur milliseconds or even seconds later, and take 500ms to 800ms to update the database. Sometimes a given minute can be missing altogether (production processes take precedence over data capture, but the sequence data is not super critical anyway)
My thinking is it would be more reliable to look for the first complete set of data anytime from 00:01:00AM onwards. So effectively, I have a table that looks a bit like this:
Apologies, I know you prefer for images of this manner to not be pasted in this manner, but I could not figure out how to create a textual table like this here (carriage return or Enter button is ignored!)
Basically, the above table is typical, but 1st minute is not guaranteed, and for that matter, I would not be 100% confident that all 80 bins are logged for a given minute. Hence my question: how to return the first complete set of data, where all 80 bins (rows) have been captured for a particular minute?
Thinking about it, I could do some sort of rowcount in the function, ensuring there are 80 rows for a given minute, but this seems less intuitive. I would like to know for sure that for each row of a given minute, bin 1 is represented, bint 2, bin 3...
Ultimately a call to this function will supply a min/max date/time and that period of time will be checked for the first available minute with a full set of bins data.
I am reasonably sure this will involve a window function, as all rows have to be assessed prior to data extraction. I've used windows functions a few times now, but still a green newbie compared to others here, so help is appreciated.
My final code, thanks to help from #klin:-
StartTime = DATE_TRUNC('minute', tme1);
EndTime = DATE_TRUNC('day', tme1) + '23 hours'::interval;
SELECT "BinSequence".*
FROM "BinSequence"
JOIN(
SELECT "binMinute" AS binminute, count("binMinute")
FROM "BinSequence"
WHERE ("binTime" >= StartTime) AND ("binTime" < EndTime)
GROUP BY 1
HAVING COUNT (DISTINCT "binBinNo") = 80 -- verifies that each and every bin is represented in returned data
) theseTuplesOnly
ON theseTuplesOnly.binminute = "binMinute"
WHERE ("binTime" >= StartTime) AND ("binTime" < EndTime)
GROUP BY 1
ORDER BY 1
LIMIT 80
Use the aggregate function count(*) grouping data by minutes (date_trunc('minute', datestamp) gives full minutes from datestamp), e.g.:
create table bins(datestamp time, bin int);
insert into bins values
('00:01:10', 1, 'a'),
('00:01:20', 2, 'b'),
('00:01:30', 3, 'c'),
('00:01:40', 4, 'd'),
('00:02:10', 3, 'e'),
('00:03:10', 2, 'f'),
('00:03:10', 3, 'g'),
('00:03:10', 4, 'h');
select date_trunc('minute', datestamp) as minute, count(bin)
from bins
group by 1
order by 1
minute | count
----------+-------
00:01:00 | 4
00:02:00 | 1
00:03:00 | 3
(3 rows)
If you are not sure that all bins are unique in consecutive minutes, use distinct (this will make the query slower):
select date_trunc('minute', datestamp) as minute, count(distinct bin)
...
You cannot select counts in aggregated minnutes and all columns of the table in a single simple select. If you want to do that, you should join a derived table or use the operator in or use a window function. A join seems to be the simplest:
select b.*, count
from bins b
join (
select date_trunc('minute', datestamp) as minute, count(bin)
from bins
group by 1
having count(bin) = 4
) s
on date_trunc('minute', datestamp) = minute
order by 1;
datestamp | bin | param | count
-----------+-----+-------+-------
00:01:10 | 1 | a | 4
00:01:20 | 2 | b | 4
00:01:30 | 3 | c | 4
00:01:40 | 4 | d | 4
(4 rows)
Note also how to use having() to filter results in the above query.
You can test the query here.

Access version 2000 & 2013 SQL pull latest date, MAX doesn't work

I have a table that needs to pull the latest date from different categories and the date might not always be filled out. I have tried to use MAX, MIN etc. it has not worked.
e.g. ID 1st Game Date 2nd Game Date 3rd Game Date
Joe 6/1/16 missing missing
Anna missing 7/2/16 7/6/16
Rita missing 7/31/16 missing
Needs to Return:
ID Date
Joe 6/1/16
Anna 7/6/16
Rita 7/31/16
I do have this sql that works well but it requires that all the dates get filled in other wise it doesn't return the latest date:
ApptDate: Switch([Pt1stApptDate]>=[2ndApptDate] And [Pt1stApptDate]>=
[3rdApptDate],[Pt1stApptDate],[2ndApptDate]>=[Pt1stApptDate] And [2ndApptDate]>=
[3rdApptDate],[2ndApptDate],[3rdApptDate]>=[Pt1stApptDate] And [3rdApptDate]>=
[2ndApptDate],[3rdApptDate])
Much appreciation in advance for all your help
Use the Nz function:
ApptDate: Switch(Nz([Pt1stApptDate],0)>=Nz([2ndApptDate],0) And
Nz([Pt1stApptDate],0)>= Nz([3rdApptDate],0), Nz([Pt1stApptDate],0),
Nz([2ndApptDate],0)>=Nz([Pt1stApptDate],0) And Nz([2ndApptDate],0)>=
Nz([3rdApptDate],0),Nz([2ndApptDate],0),
Nz([3rdApptDate],0)>=Nz([Pt1stApptDate],0) And Nz([3rdApptDate],0)>=
Nz([2ndApptDate],0),Nz([3rdApptDate],0))
Having said that, your table design is incorrect.
You should be storing each ApptDate per ID in a separate row:
ApptID ID ApptDate ApptNr
1 Joe 6/1/2016 1
2 Anna 7/2/2016 2
3 Anna 7/6/2016 3
4 Rita 7/31/2016 2
whereas ApptID is an autonumber and ApptNr is a sequence per ID (what you seem to call a category).
When you are having problems writing what should be simple queries (SQL DML) then you should consider you may have design flaws (in your SQL DDL).
The missing values are causing you to avoid the MAX set function and compels you to handle nulls in queries (note the NZ() function will cause errors outside of the Access UI). Better to model missing data by simply not adding a row to a table. Think about it: you want the smallest amount of data possible in your database, you can infer the remainder e.g. if Joe was not gaming on 1 Jan and 2 Jan and 3 Jan and 4 Jan etc then simply don't add anything to your database for all these dates.
The following SQL DDL requires ANSI-92 Query Mode (but you can create the same tables/views using the Access GUI tools):
CREATE TABLE Attendance
( gamer_name VARCHAR( 35 ) NOT NULL REFERENCES Gamers ( gamer_name ),
game_sequence NOT NULL CHECK ( game_sequence BETWEEN 1 AND 3 )
game_date DATETIME NOT NULL,
UNIQUE ( game_date, game_sequence ) );
INSERT INTO Attendance VALUES ( 'Joe', 1, '2016-06-01' );
INSERT INTO Attendance VALUES ( 'Anna', 2, '2016-07-02' );
INSERT INTO Attendance VALUES ( 'Anna', 3, '2016-07-06' );
INSERT INTO Attendance VALUES ( 'Rita', 1, '2016-07-31' );
CREATE VIEW MostRecentAttendance
AS
SELECT gamer_name, MAX ( game_date ) AS game_date
FROM Attendance
GROUP
BY gamer_name;
SELECT *
FROM Attendance a
WHERE EXISTS ( SELECT *
FROM MostRecentAttendance r
WHERE r.gamer_name = a.gamer_name
AND r.game_date = a.game_date );
To find the missing sequence values for players, create a table of all possible sequence numbers { 1, 2, 3 } to which you can 'anti-join' (e.g. NOT EXISTS).

PostgreSQL Calculating a Consecutive session

I have a very large table that contains 4 columns: 1) the status property of a member has changed to:
online, offline, game_lobby, load_screen 2) the status property of a member has changed from: online, offline, game_lobby, and load_screen 3) a member's ID number and 4)the timestamp of when the status property changed). I want to calculate the average time all members spend online, which would be the difference between the timestamp of when a state changes from online to offline and the timestamp of when a state changes from offline to online:
sample dataset
Using the sample linked above, the average calculated would be (01/03/2016 15:32:05 - 01/02/2016 07:18:32 + 03/14/2016 05:46:41 - 03/14/2016 04:09:04
)/2
Here's what I wrote, which gave me a few negative averages calculated for certain members, which can't be right:
with sessions as
( select
date_trunc('week', sc.occurred_at) as week,
sc.occurred_at,
sc.id,
timestampdiff(second,lag(sc.occurred_at) over (order by sc.id asc, sc.occurred_at),
sc.occurred_at)/3600 as session
from state_changes sc
where
((from_state = 'offline' and to_state = 'online') or
(from_state = 'offline' and to_state = 'online'))
and occurred_at at time zone 'America/New_york' > '2016-01-01'
)
select week, avg(session), id
from sessions
group by 1,3;
I can roll-up the averages into a single value instead of by member, but what I wrote is clearly wrong since a small number of the averages are returning negative. Does anyone have any suggestions?
You are basically interested in the time period between going from offline->online and then going back ?->offline. So the trick is to get only those records in a sub-query and then do the lag over those two. You have some problems with your code in exactly those two issues, see code below. In the main query you then get the average and throw out the offline->online row.
SELECT date_trunc('week', logout) AS week,
avg(extract(epoch from logout - login)), -- in seconds
id
FROM (
SELECT lag(occurred_at) OVER (PARTITION BY id ORDER BY occurred_at) AS login,
occurred_at AS logout,
id,
to_state
FROM state_change
WHERE (from_state = 'offline' or to_state = 'offline')
AND occurred_at > '2016-01-01') sub
WHERE to_state = 'offline'
GROUP BY 1,3;

T SQL Group By Having Count of returning wrong data

I have a complex t-sql query "for me anyway" that isn't functioning the way I need it to.
The query is designed to return simular records unioned across two databases that have simular records in each database.
If a product fails, it will be assigned a "Failed" in one DB, or a "PF" in the other DB. "PR" means "PRODUCT READY" in both.
I am trying to return a list that includes only "Failed or PF" data that has < two records based on the ProdNo column.
"This is to prompt the employee to test the product again", if 2 records exist in either DB, no action is needed."
My query breaks down when I try to limit the results to show only entries that have less than 2 duplicate "ProdNo" values.
In other words, a product is produced and given a ProdNo number. After testing, it can be marked as a PR, PF, or Failed.
My query should never produce any results with PR, yet when a test is performed several days after the original test, PR values appear in my results.
Here is the query with notes.
-- 1st half of union query
-- Find all run failed's that do not have a PR'ed 2nd test.
Declare #daysback int
set #daysback = -2
select min(sid3)as 'ProdNo',
min([Timestamp])as 'TimeS',
min(Burn) as 'type',
min(Mixer) as 'Mixer'
from [Stat].[dbo].[oedata]
where sid3 IN
(
-- Find run faileds and PRs in Stat db
SELECT [sid3]
from [Stat].[dbo].[oedata]
where (type ='wos') and (burn = 'failed')
and (Flag = '128')
)
--- Limit Results to return only instances of 1 record
AND [Timestamp] > DATEADD( d, #daysback, getdate())
group by Sid3
having COUNT(Sid3) = 1
union all
-- Find PF's in CompanyMES MLab DB
select min(mProd_ProdNumber)as 'ProdNo',
min([Timestamp])as 'TimeS',
min(CheckType) as 'type',
min(Mixer) as 'Mixer'
from [CompanyMES].[dbo].[mLab]
where mProd_ProdNumber IN
(
-- Find failed DFs or scrap wos products
SELECT [mProd_ProdNumber]
from [CompanyMES].[dbo].[mLab]
where (CheckType = 'PF' )
)
-- Limit Results to instances with only 1 record
AND [Timestamp] > DATEADD( d, #daysback, getdate())
group by mProd_ProdNumber
having COUNT(mProd_ProdNumber) < 2
order by TimeS Desc
--------------------------------------------------------------------------
Example data and results:
ProdNo Type
=================
'1111' 'PF'
'1111' 'PR'
'1112' 'PR'
'1113' 'PF'
'1114' 'Failed'
ProdNo 1111 shouldn't return anything as it has 2 records as well as a PR exists.
1113 and 1114 should return results as they both have only 1 record as well as have PF and Failed Types
I think the issue is that you are applying a filter on the Timestamp in your outer queries, but not the inner one where you are filtering the Product Numbers. So, for 1111 and 1112, it could have a 'PF' (or 'Failed') outside of your timestamp filtered range, but only 'PR' inside of it (in one of the tables).