Looking for a real quick and dirty dump of data from a table that more or less has this layout...
ID | EventType | EventDate
------+-------------+------------
1 | Inbound | 2018-07-18 00:00:00
2 | Outbound | 2018-07-18 12:00:00
3 | Inbound | 2018-07-19 00:12:00
4 | Failure | 2018-07-19 00:12:00
5 | Inbound | 2018-07-19 00:12:00
6 | Outbound | 2018-07-19 00:12:00
| |
And what I want out of it is a query that spits out the count of each occurrence for a day. So '2018-07-19' would spit me out
Failures | Inbounds | Outbounds
-----------+----------+------------
1 | 2 | 1
Here's my real crap attempt at it, but I assume there's an easier way to get away with it. Ideally I'd be able to drop this in a view and filter by date on my own but if I need to pass a target date on stored proc then that's fine I suppose.
There's only 3 defined event types on my database so my static solution of counting each one is fine. Having something that dynamically adapts to however many distinct event types would be better but not necessary
DECLARE #TestDate datetime2 = '2018-07-19 08:41:55'
SELECT
'Failures' = SUM(Failures),
'Inbounds' = SUM(Inbounds),
'Outbounds'= SUM(Outbounds)
FROM (
SELECT 'Failures' = COUNT(ID), 'Inbounds' = 0, 'Outbounds' = 0 FROM tblTests WHERE EventType = 'Failed' AND EventDate BETWEEN CAST(#TestDate AS DATE) AND DATEADD(DAY, 1, CAST(#TestDate AS DATE)) UNION
SELECT 'Failures' = 0, 'Inbounds' = COUNT(ID), 'Outbounds' = 0 FROM tblTests WHERE EventType = 'Inbound' AND EventDate BETWEEN CAST(#TestDate AS DATE) AND DATEADD(DAY, 1, CAST(#TestDate AS DATE)) UNION
SELECT 'Failures' = 0, 'Inbounds' = 0, 'Outbounds' = COUNT(ID) FROM tblTests WHERE EventType = 'Outbound' AND EventDate BETWEEN CAST(#TestDate AS DATE) AND DATEADD(DAY, 1, CAST(#TestDate AS DATE))
) FIO
The easy way to do such things is called conditional aggregating:
SELECT Cast(EventDate As Date) As SummaryDate,
SUM(CASE WHEN EventType = 'Failed' THEN 1 ELSE 0 END) As Failures,
SUM(CASE WHEN EventType = 'Inbound' THEN 1 ELSE 0 END) As Inbounds,
SUM(CASE WHEN EventType = 'Outbound' THEN 1 ELSE 0 END) As Outbounds
FROM tblTests
GROUP BY Cast(EventDate As Date)
This would be a great opportunity in my opinion to use a pivot with a count aggregate.
The example below walks through creating a test table loading it with the data from the OP original question and pivoting the results on date
Create the test table
create table testtable (id int, value varchar(20), dt datetime)
Load the temp data into the new table
insert into testtable
values( 1, 'Inbound', '2018-07-18 00:00:00'),
(2, 'Outbound', '2018-07-18 12:00:00'),
(3, 'Inbound' , '2018-07-19 00:12:00'),
(4, 'Failure' , '2018-07-19 00:12:00'),
(5, 'Inbound' , '2018-07-19 00:12:00'),
(6, 'Outbound' , '2018-07-19 00:12:00')
Pivot the data to the correct result
select * from (
select value, cast(dt as date) d
from testtable )a
pivot(
count(value) for value in ([Inbound],[Outbound],[Failure]))piv
This returns results this as a result
This can easily be expanded by adding additional values into the pivot.
Here's what you need
SELECT EventType, COUNT(1) AS Cnt, CONVERT(DATE, EventDate) AS EventDate
FROM dbo.tblTests
GROUP BY EventType, CONVERT(DATE, EventDate)
You can put a WHERE to limit the results to a specific date.
If you plan to run this often and run it on a large table I'll suggest that you create a computed persisted column to have the date only,
You can do this like that:
ALTER TABLE tblTests ADD DateOnly AS CONVERT(DATE, EventDate) PERSISTED NOT NULL
Then use DateOnly in place of CONVERT(DATE, EventDate) in the above query.
And for best performance you can create an index on EventType and the new DateOnly columns
like that:
CREATE NONCLUSTERED INDEX [NCI_tblTest__EventType__DateOnly] ON [dbo].[tblTests]
(
[EventType] ASC,
[DateOnly ] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
and remember to rebuild indexes regularly.
EDIT:
This will give you different event types on a separate rows.
EDIT2: use PIVOT to produce the results in the exact same way as you want it. Leaving that for you to exercise :) https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017
Related
I need a Postgres query to get "A value", "A value date", "B value" and "B value date"
The B value and date should be the one which is between 95 to 100 days of "A value date"
I have the query to get "A value" and "A value date", don't know how to get the B value and date by using the result (A value)
select u.id,
(select activity
from Sol_pro
where user_id = u.id
and uni_par = 'weight_m'
order by created_at asc
limit 1) as A_value
from users u;
the B_value and B_date from the same sol_pro table,
95-100 days after the A_value date (if more mores are there between 95-100, I need only one(recent) value) Expected
Output: id = 112233, A_Value = "weight = 210.25", A_value_date = 12-12-2020, B_value = "weight = 220.25", B_value_date = 12-12-2020
Well without table definition I developed it from the output columns and your original query. Further I had to make up stuff for the data, but the following should be close enough for you to see the technique involved. It is actually a simple join operation, just that it is self-join on the table sol_pol. (I.e it is joined to itself). Notice comments indicated by --<<<
select distinct on (a.id)
a.id
, a.user_id --<<< assumption needed
, a.activity "A_value"
, a.created_at::date "A_date"
, b.activity "B_value"
, b.created_at::date
from sol_pro a
join sol_pro b
on ( b.user_id = a.user_id --<<< assumption
and b.uni_par = a.uni_par --<<< assumption
)
where a.id = 112233 --<<< given Orig query
and a.uni_par = 'weight_m' --<<< given Orig query, but not needed if id is PK
and b.created_at between a.created_at + interval '95 days' --<<< date range inclusive of 95-100
and a.created_at + interval '100 days'
order by a.id, b.created_at desc;
See here for example run. The example contains a column you will not have "belayer_note". This is just a note-to-self I sometimes use for initial testing.
Suppose that you have tables users and measures:
# create table users (id integer);
# create table measures (user_id integer, value decimal, created_at date);
They are filled with test data:
INSERT INTO users VALUES (1), (2), (3);
INSERT INTO measures VALUES (1, 100, '9/10/2020'), (1, 103, '9/15/2020'), (1, 104, '10/2/2020');
INSERT INTO measures VALUES (2, 200, '9/11/2020'), (2, 207, '9/21/2020'), (2, 204, '10/1/2020');
INSERT INTO measures VALUES (3, 300, '9/12/2020'), (3, 301, '10/1/2020'), (3, 318, '10/12/2020');
Query:
WITH M AS (
SELECT
A.user_id,
A.value AS A_value, A.created_at AS A_date,
B.value AS B_value, B.created_at AS B_date
FROM measures A
LEFT JOIN measures B ON
A.user_id = B.user_id AND
B.created_at >= A.created_at + INTERVAL '5 days' AND
B.created_at <= A.created_at + INTERVAL '10 days'
ORDER BY
user_id, A_date, B_date
)
SELECT DISTINCT ON (user_id) * FROM M;
will select for each user:
the first available measurement (A)
the next measurement (B) which is made between 5-10 days from (A).
Result:
user_id | a_value | a_date | b_value | b_date
---------+---------+------------+---------+------------
1 | 100 | 2020-09-10 | 103 | 2020-09-15
2 | 200 | 2020-09-11 | 207 | 2020-09-21
3 | 300 | 2020-09-12 | [NULL] | [NULL]
(3 rows)
P.S. You must sort table rows carefully with ODRDER BY when using DISTINCT ON () clause because PostgreSQL will keep only first records and discard others.
I have the table bellow, I need to delete opposite rows between two dates by pairs based on PerCode Value,
In fact, we delete rows inside the date range that have the same PerCode and have equal and opposite values.
The problem is that begin date and end date are provided by users as parameters while reporting but the query take too much time if i try to delete these at runtime.
Example:
Begin date = 01/01/2018
End date = 31/12/2018
I should delete rows 3 and 4.
Do u have any idea how to do that while optimising performance (the table have 200 Millions of rows)
+----+------------+---------+---------+-----------+
| Id | Date | PerCode | Value | IsDeleted |
+----+------------+---------+---------+-----------+
| 1 | 01/10/2017 | C1 | 10 | |
| 2 | 01/01/2018 | C1 | -10 | |
| 3 | 15/02/2018 | C2 | 20 | 1 |
| 4 | 10/03/2018 | C2 | -20 | 1 |
| 5 | 01/12/2018 | C3 | 15 | |
| 6 | 01/02/2019 | C3 | -15 | |
+----+------------+---------+---------------------+
I had a quick go at this, using a table variable to allow me to knock together a query using your test data. However, this might not perform well when used over 2 million rows?
DECLARE #table TABLE (id INT, [date] DATE, percode CHAR(2), [value] INT, isdeleted BIT);
INSERT INTO #table
SELECT 1, '20171001', 'C1', 10, NULL
UNION ALL
SELECT 2, '20180101', 'C1', -10, NULL
UNION ALL
SELECT 3, '20180215', 'C2', 20, NULL
UNION ALL
SELECT 4, '20180310', 'C2', -20, NULL
UNION ALL
SELECT 5, '20181201', 'C3', 15, NULL
UNION ALL
SELECT 6, '20190201', 'C3', -15, NULL;
DECLARE #date_from DATE = '20180101';
DECLARE #date_to DATE = '20181231';
WITH ordered AS (
SELECT
id,
percode,
[value],
ROW_NUMBER() OVER (PARTITION BY percode, [value] ORDER BY [value]) AS order_id
FROM
#table
WHERE
[date] BETWEEN #date_from AND #date_to
AND ISNULL(isdeleted, 0) != 1),
matches AS (
SELECT
m1.id AS match_1_id,
m2.id AS match_2_id
FROM
ordered m1
INNER JOIN ordered m2 ON m1.percode = m2.percode AND m1.[value] = m2.[value] * -1 AND m1.order_id = m2.order_id)
UPDATE
t
SET
isdeleted = 1
FROM
#table t
INNER JOIN matches m ON m.match_1_id = t.id OR m.match_2_id = t.id;
SELECT * FROM #table;
Results:
id date percode value isdeleted
1 2017-10-01 C1 10 NULL
2 2018-01-01 C1 -10 NULL
3 2018-02-15 C2 20 1
4 2018-03-10 C2 -20 1
5 2018-12-01 C3 15 NULL
6 2019-02-01 C3 -15 NULL
How does it work? Well I broke the task down into steps:
make a list of all rows in the date period specified, where they aren't already deleted;
for each row of data assign it a running count number, grouped by the percode and the value. So the first C1 10 would be number #1, then the second C1 10 would be number #2, etc.;
to find matches it's simply a case of finding any value that has the same percode, the equal and opposite value to another value group, and the same running count number;
where there's a match set the isdeleted flag to 1.
Here is my code but this is not performant over 200 millions rows in real time.
and in real life Percode is concatenation of 5 columns (date, varchar(13), varchar(2),varchar(1) and varchar(50)) and Value is 4 numeric columns.
I am searching for other ideas.
--DECLARE #table TABLE (id INT, [date] DATE, percode CHAR(2), [value] INT, isdeleted BIT);
Select * INTO #MasterTable FROM
(
SELECT 1 id, '20171001' [date], 'C1' percode, 10 [value], NULL isdeleted
UNION ALL
SELECT 2, '20180101', 'C1', -10, NULL
UNION ALL
SELECT 3, '20180215', 'C2', 20, NULL
UNION ALL
SELECT 4, '20180310', 'C2', -20, NULL
UNION ALL
SELECT 5, '20181201', 'C3', 15, NULL
UNION ALL
SELECT 6, '20190201', 'C3', -15, NULL
) T ;
DECLARE #date_from DATE = '20180101';
DECLARE #date_to DATE = '20181231';
select F.id
Into #TmpTable
from
(
select Id, PerCode, Value
,ROW_NUMBER() over (partition by PerCode, Value order by (select 0)) Rn2
from
#MasterTable ) F
inner join (
select
PerCode
, Rn1
from (
select
PerCode
,Value
,ROW_NUMBER() over (partition by PerCode, Value order by (select 0)) Rn1
FROM #MasterTable
where
[date] BETWEEN #date_from AND #date_to
) A
group by PerCode , Rn1
having sum(Value) = 0 and count(*)>1
) B on F.PerCode = B.PerCode
and F.Rn2 = B.Rn1
update R
set IsDeleted = 1
from #MasterTable R
inner join #TmpTable P
on R.id = P.id
select * from #MasterTable
drop table #MasterTable ;
drop table #TmpTable;
I have a table of item price changes, and I want to use it to create a table of item prices for each date (between the item's launch and end dates).
Here's some code to create the date:-
declare #Item table (item_id int, item_launch_date date, item_end_date date);
insert into #Item Values (1,'2001-01-01','2016-01-01'), (2,'2001-01-01','2016-01-01')
declare #ItemPriceChanges table (item_id int, item_price money, my_date date);
INSERT INTO #ItemPriceChanges VALUES (1, 123.45, '2001-01-01'), (1, 345.34, '2001-01-03'), (2, 34.34, '2001-01-01'), (2,23.56 , '2005-01-01'), (2, 56.45, '2016-05-01'), (2, 45.45, '2017-05-01'); ;
What I'd like to see is something like this:-
item_id date price
------- ---- -----
1 2001-01-01 123.45
1 2001-01-02 123.45
1 2001-01-03 345.34
1 2001-01-04 345.34
etc.
2 2001-01-01 34.34
2 2001-01-02 34.34
etc.
Any suggestions on how to write the query?
I'm using SQL Server 2016.
Added:
I also have a calendar table called "dim_calendar" with one row per day. I had hoped to use a windowing function, but the nearest I can find is lead() and it doesn't do what I thought it would do:-
select
i.item_id,
c.day_date,
ipc.item_price as item_price_change,
lead(item_price,1,NULL) over (partition by i.item_id ORDER BY c.day_date) as item_price
from dim_calendar c
inner join #Item i
on c.day_date between i.item_launch_date and i.item_end_date
left join #ItemPriceChanges ipc
on i.item_id=ipc.item_id
and ipc.my_date=c.day_date
order by
i.item_id,
c.day_date;
Thanks
I wrote this prior to your edit. Note that your sample output suggests that an item can have two prices on the day of the price change. The following assumes that an item can only have one price on a price change day and that is the new price.
declare #Item table (item_id int, item_launch_date date, item_end_date date);
insert into #Item Values (1,'2001-01-01','2016-01-01'), (2,'2001-01-01','2016-01-01')
declare #ItemPriceChange table (item_id int, item_price money, my_date date);
INSERT INTO #ItemPriceChange VALUES (1, 123.45, '2001-01-01'), (1, 345.34, '2001-01-03'), (2, 34.34, '2001-01-01'), (2,23.56 , '2005-01-01'), (2, 56.45, '2016-05-01'), (2, 45.45, '2017-05-01');
SELECT * FROM #ItemPriceChange
-- We need a table variable holding all possible date points for the output
DECLARE #DatePointList table (DatePoint date);
DECLARE #StartDatePoint date = '01-Jan-2001';
DECLARE #MaxDatePoint date = GETDATE();
DECLARE #DatePoint date = #StartDatePoint;
WHILE #DatePoint <= #MaxDatePoint BEGIN
INSERT INTO #DatePointList (DatePoint)
SELECT #DatePoint;
SET #DatePoint = DATEADD(DAY,1,#DatePoint);
END;
-- We can use a CTE to sequence the price changes
WITH ItemPriceChange AS (
SELECT item_id, item_price, my_date, ROW_NUMBER () OVER (PARTITION BY Item_id ORDER BY my_date ASC) AS SeqNo
FROM #ItemPriceChange
)
-- With the price changes sequenced, we can derive from and to dates for each price and use a join to the table of date points to produce the output. Also, use an inner join back to #item to only return rows for dates that are within the start/end date of the item
SELECT ItemPriceDate.item_id, DatePointList.DatePoint, ItemPriceDate.item_price
FROM #DatePointList AS DatePointList
INNER JOIN (
SELECT ItemPriceChange.item_id, ItemPriceChange.item_price, ItemPriceChange.my_date AS from_date, ISNULL(ItemPriceChange_Next.my_date,#MaxDatePoint) AS to_date
FROM ItemPriceChange
LEFT OUTER JOIN ItemPriceChange AS ItemPriceChange_Next ON ItemPriceChange_Next.item_id = ItemPriceChange.item_id AND ItemPriceChange.SeqNo = ItemPriceChange_Next.SeqNo - 1
) AS ItemPriceDate ON DatePointList.DatePoint >= ItemPriceDate.from_date AND DatePointList.DatePoint < ItemPriceDate.to_date
INNER JOIN #item AS item ON item.item_id = ItemPriceDate.item_id AND DatePointList.DatePoint BETWEEN item.item_launch_date AND item.item_end_date
ORDER BY ItemPriceDate.item_id, DatePointList.DatePoint;
#AlphaStarOne Perfect! I've modified it to use a Windowing function rather than a self-join, but what you've suggested works. Here's my implementation of that in case anyone else needs it:
SELECT
ipd.item_id,
dc.day_date,
ipd.item_price
FROM dim_calendar dc
INNER JOIN (
SELECT
item_id,
item_price,
my_date AS from_date,
isnull(lead(my_date,1,NULL) over (partition by item_id ORDER BY my_date),getdate()) as to_date
FROM #ItemPriceChange ipc1
) AS ipd
ON dc.day_date >= ipd.from_date
AND dc.day_date < ipd.to_date
INNER JOIN #item AS i
ON i.item_id = ipd.item_id
AND dc.day_date BETWEEN i.item_launch_date AND i.item_end_date
ORDER BY
ipd.item_id,
dc.day_date;
I'm trying to return values from a table so that I get 1 row per purchaseID and return multiple columns with Buyers First and Last Names.
E.G
I have a table with the following Data
| PurchaseID | FirstName | LastName|
|---------1------- | ----Joe------ | ---Smith----|
|---------1------- | -----Peter--- | ---Pan------|
|---------2------- | ----Max------|---Power----|
|---------2------- | -----Jack---- | ---Frost----|
I'm trying to write a query that returns the values like so
| PurchaseID | Buyer1FirstName | Buyer1LastName | Buyer2FirstName |Buyer2LastName|
|--------1---------|------------Joe--------- |--------Smith----------|---------Peter-----------|--------Pan------------|
|--------2---------|-------------Max--------|---------Power--------|---------Jack -----------|---------Frost----------|
I've been looking online but because I'm not sure how to explain in words what I want to do, I'm not having much luck. I'm hoping with a more visual explanation someone could point me in the right direction.
Any help would be awesome.
You can use ROW_NUMBER as the below:
DECLARE #Tbl TABLE (PurchaseID INT, FirstName VARCHAR(50), LastName VARCHAR(50))
INSERT INTO #Tbl
VALUES
(1, 'Joe', 'Smith'),
(1, 'Peter', 'Pan'),
(2, 'Max', 'Power'),
(2, 'Jack', 'Frost'),
(2, 'Opss', 'Sspo')
;WITH CTE
AS
(
SELECT
*, ROW_NUMBER() OVER (PARTITION BY PurchaseID ORDER BY PurchaseID) RowId
FROM #Tbl
)
SELECT
A.PurchaseID,
MIN(CASE WHEN A.RowId = 1 THEN A.FirstName END) Buyer1FirstName,
MIN(CASE WHEN A.RowId = 1 THEN A.LastName END ) Buyer1LastName ,
MIN(CASE WHEN A.RowId = 2 THEN A.FirstName END) Buyer2FirstName ,
MIN(CASE WHEN A.RowId = 2 THEN A.LastName END )Buyer2LastName,
MIN(CASE WHEN A.RowId = 3 THEN A.FirstName END) Buyer3FirstName ,
MIN(CASE WHEN A.RowId = 3 THEN A.LastName END )Buyer3LastName,
MIN(CASE WHEN A.RowId = 4 THEN A.FirstName END) Buyer4FirstName ,
MIN(CASE WHEN A.RowId = 4 THEN A.LastName END )Buyer4LastName
FROM
CTE A
GROUP BY
A.PurchaseID
Result:
PurchaseID Buyer1FirstName Buyer1LastName Buyer2FirstName Buyer2LastName Buyer3FirstName Buyer3LastName Buyer4FirstName Buyer4LastName
----------- ------------------- -------------------- -------------------- ------------------ ------------------- ----------------- ------------------- --------------
1 Joe Smith Peter Pan NULL NULL NULL NULL
2 Max Power Jack Frost Opss Sspo NULL NULL
I have a data set that I want to parse for to see multi-touch attribution. The data set is made up by leads who responded to a marketing campaign and their marketing source.
Each lead can respond to multiple campaigns and I want to get their first marketing source and their last marketing source in the same table.
I was thinking I could create two tables and use a select statement from both.
The first table would attempt to create a table with the most recent marketing source from every person (using email as their unique ID).
create table temp.multitouch1 as (
select distinct on (email) email, date, market_source as last_source
from sf.campaignmember
where date >= '1/1/2016' ORDER BY DATE DESC);
Then I would create a table with deduped emails but this time for the first source.
create table temp.multitouch2 as (
select distinct on (email) email, date, market_source as first_source
from sf.campaignmember
where date >= '1/1/2016' ORDER BY DATE ASC);
Finally I wanted to simply select the email and join the first and last market sources to it each in their own column.
select a.email, a.last_source, b.first_source, a.date
from temp.multitouch1 a
left join temp.multitouch b on b.email = a.email
Since distinct on doesn't work on redshift's postgresql version I was hoping someone had an idea to solve this issue in another way.
EDIT 2/22: For more context I'm dealing with people and campaigns they've responded to. Each record is a "campaign response" and every person can have more than one campaign response with multiple sources. I'm trying make a select statement which would dedupe by person and then have columns for the first campaign/marketing source they've responded to and the last campaign/marketing source they've responded to respectively.
EDIT 2/24: Ideal output is a table with 4 columns: email, last_source, first_source, date.
The first and last source columns would be the same for people with only 1 campaign member record and different for everyone who has more than 1 campaign member record.
I believe you could use row_number() inside case expressions like this:
SELECT
email
, MIN(first_source) AS first_source
, MIN(date) first_date
, MAX(last_source) AS last_source
, MAX(date) AS last_date
FROM (
SELECT
email
, date
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date ASC) = 1 THEN market_source
ELSE NULL
END AS first_source
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date DESC) = 1 THEN market_source
ELSE NULL
END AS last_source
FROM sf.campaignmember
WHERE date >= '2016-01-01'
) s
WHERE first_source IS NOT NULL
OR last_source IS NOT NULL
GROUP BY
email
tested here: SQL Fiddle
PostgreSQL 9.3 Schema Setup:
CREATE TABLE campaignmember
(email varchar(3), date timestamp, market_source varchar(1))
;
INSERT INTO campaignmember
(email, date, market_source)
VALUES
('a#a', '2016-01-02 00:00:00', 'x'),
('a#a', '2016-01-03 00:00:00', 'y'),
('a#a', '2016-01-04 00:00:00', 'z'),
('b#b', '2016-01-02 00:00:00', 'x')
;
Query 1:
SELECT
email
, MIN(first_source) AS first_source
, MIN(date) first_date
, MAX(last_source) AS last_source
, MAX(date) AS last_date
FROM (
SELECT
email
, date
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date ASC) = 1 THEN market_source
ELSE NULL
END AS first_source
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date DESC) = 1 THEN market_source
ELSE NULL
END AS last_source
FROM campaignmember
WHERE date >= '2016-01-01'
) s
WHERE first_source IS NOT NULL
OR last_source IS NOT NULL
GROUP BY
email
Results:
| email | first_source | first_date | last_source | last_date |
|-------|--------------|---------------------------|-------------|---------------------------|
| a#a | x | January, 02 2016 00:00:00 | z | January, 04 2016 00:00:00 |
| b#b | x | January, 02 2016 00:00:00 | x | January, 02 2016 00:00:00 |
& a small extension to the request, count the number of contact points.
SELECT
email
, MIN(first_source) AS first_source
, MIN(date) first_date
, MAX(last_source) AS last_source
, MAX(date) AS last_date
, MAX(numof) AS Numberof_Contacts
FROM (
SELECT
email
, date
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date ASC) = 1 THEN market_source
ELSE NULL
END AS first_source
, CASE
WHEN ROW_NUMBER() OVER (PARTITION BY email ORDER BY date DESC) = 1 THEN market_source
ELSE NULL
END AS last_source
, COUNT(*) OVER (PARTITION BY email) as numof
FROM campaignmember
WHERE date >= '2016-01-01'
) s
WHERE first_source IS NOT NULL
OR last_source IS NOT NULL
GROUP BY
email
You can use the good old left join groupwise maximum.
SELECT DISTINCT c1.email, c1.date, c1.market_source
FROM sf.campaignmember c1
LEFT JOIN sf.campaignmember c2
ON c1.email = c2.email AND c1.date > c2.date AND c1.id > c2.id
LEFT JOIN sf.campaignmember c3
ON c1.email = c3.email AND c1.date < c3.date AND c1.id > c3.id
WHERE c1.date >= '1/1/2016' AND c2.date >= '1/1/2016'
AND (c2.email IS NULL OR c3.email IS NULL)
This assumes you have an unique id column, if (date, email) is unique id is not needed.