SQL Exclude Field from GROUP BY in results but use in WHERE

SQL Exclude Field from GROUP BY in results but use in WHERE - tsql

Pretty simple table:
CREATE TABLE [dbo].[Recognitions](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Submitter_CH_id] [int] NULL,
[Submitter_Last_Name] [varchar](50) NULL,
[Submit_Date] [datetime] NULL,
Submitter_CH_id Submitter_Last_Name Submit_Date
50 Prokupek 2014-04-01 00:00:00.000
50 Prokupek 2014-04-07 00:00:00.000
50 Prokupek 2014-04-01 00:00:00.000
50 Prokupek 2014-04-07 00:00:00.000
215 Conklin 2014-04-07 00:00:00.000
215 Conklin 2014-04-07 00:00:00.000
130 Catron 2014-04-07 00:00:00.000
136 Jardee 2014-04-07 00:00:00.000
247 Emken 2014-04-07 00:00:00.000
What I need to do is get a count of all the submissions made with in a certain date range grouped by recipient_ch_id. My app allows the user to enter the date range, so it needs to be part of the query results for my app to use it.
I need the results to be grouped by Submitter_CH_id. So something like this:
SELECT TOP (100) PERCENT Submitter_CH_id, Submitter_First_Name, Submitter_Last_Name, Submitter_Email, Submitter_Department,
Submit_Date AS [Last Submit], COUNT(Submitter_CH_id) AS [Total Submit]
FROM dbo.Recognitions
GROUP BY Submitter_First_Name, Submitter_Last_Name, Submitter_Email, Submitter_Department, Submitter_CH_id, Submit_Date
ORDER BY Submitter_CH_id
What I would like is the following:
Submitter_CH_ID Submitter_Last_Name Total Submissions
50 Prokupek 4
215 Conklin 2
130 Catron 1
... but because I also have to include Submit_Date in my GROUP BY the results instead show the count per ID per unique date (which it has to of course), so I get something like this:
Submitter_CH_ID Submitter_Last_Name Total Submissions
50 Prokupek 2
50 Prokupek 2
215 Conklin 1
215 Conklin 1
130 Catron 1
Any thoughts? This is MS SQL 2008. Thanks very much.

use a sub query.... like this:
select Submitter_CH_ID, Submitter_Last_Name, count(ID) AS [Total Submissions]
from (
select ID, Submitter_CH_ID, Submitter_Last_Name
from dbo.Recognitions
where date >= #start_date and date <= #end_date
) T
GROUP BY Submitter_CH_ID, Submitter_Last_Name
yay sub-queries!

I am guessing that you feel that you need Submit_Date in the GROUP BY clause because you're including it in the SELECT clause, because you're filtering by this value in the returned results. If that's correct, you can delete the field from your SELECT and GROUP BY lists if you instead filter in this query:
SELECT
Submitter_CH_ID, Submitter_Last_Name, COUNT(*) AS Submissions
FROM
Recognitions
WHERE
Submit_Date BETWEEN #StartDate AND #EndDate
GROUP BY
Submitter_CH_ID, Submitter_Last_Name
ORDER BY
Submitter_CH_ID

Related

Window Function For Consecutive Dates

I want to know how many users were active for 3 consecutive days on any given day.
e.g on 2022-11-03, 1 user (user_id = 111) was active 3 days in a row. Could someone please advise what kind of window function(?) would be needed?
This is my dataset:
user_id
active_date
111
2022-11-01
111
2022-11-02
111
2022-11-03
222
2022-11-01
333
2022-11-01
333
2022-11-09
333
2022-11-10
333
2022-11-11

If you are confident there are no duplicate user_id + active_date rows in the source data, then you can use two LAG functions like this:
SELECT user_id,
active_date,
CASE WHEN DATEADD(day, -1, active_date) = LAG(active_date, 1) OVER (PARTITION BY user_id ORDER BY active_date)
AND DATEADD(day, -2, active_date) = LAG(active_date, 2) OVER (PARTITION BY user_id ORDER BY active_date)
THEN 'Yes'
ELSE 'No'
END AS rowof3
FROM your_table
ORDER BY user_id, active_date;
If there might be duplication, use this FROM clause instead:
FROM (SELECT DISTINCT user_id, active_date :: DATE FROM your_table)

Find Accounts with X Number of Transactions within Y Days of Each Other in a Larger Date Range

I am trying to write a SQL statement that will find the accounts that have had 3 or more transactions within 3 days whose absolute value is greater than $10.00 over the course of a week and then return those transactions.
Consider this data...
TransactionID AccountNumber TransactionDate TransactionAmount
------------- ------------- --------------- -----------------
1 0123 2020-09-01 45.75
2 0123 2020-09-02 5.23
3 0123 2020-09-03 9.94
4 0123 2020-09-05 8.35
5 0123 2020-09-06 -16.23
6 0123 2020-09-07 14.71
7 0123 2020-09-08 15.03
8 0123 2020-09-08 23.10
9 0123 2020-09-09 94.20
10 0123 2020-09-09 5.01
11 0123 2020-09-10 3.02
12 0123 2020-09-11 4.37
13 0123 2020-09-12 4.54
14 9876 2020-09-01 -45.75
15 9876 2020-09-02 5.27
16 9876 2020-09-05 19.79
17 9876 2020-09-05 -11.64
18 9876 2020-09-06 12.42
If the week under review is 2020-09-01 to 2020-09-07 I would expect only AccountNumber 9876 to fit the criteria with TransactionIDs 16, 17, and 18 being the 3 transactions within 3 days with an absolute value greater than $10.00.
It seems like I should be able to use window functions (and perhaps framing), but I can't figure out how to start.
I have attempted without the use of window functions based on the answers to this question...
multiple transactions within a certain time period, limited by date range
DECLARE
#BeginDate DATE
, #EndDate DATE
, #ThresholdAmount DECIMAL(10, 2)
, #ThresholdCount INT
, #NumberOfDays INT;
SET #BeginDate = '09/01/2020';
SET #EndDate = '09/07/2020';
SET #ThresholdAmount = 10.00;
SET #ThresholdCount = 3;
SET #NumberOfDays = 3;
SELECT t.*
FROM (
SELECT
t1.*
, (
SELECT COUNT(*)
FROM Transactions t2
WHERE t2.AccountNumber = t1.AccountNumber
AND t2.TransactionID <> t1.TransactionID
AND t2.TransactionDate >= t1.TransactionDate
AND t2.TransactionDate < DATEADD(DAY, #NumberOfDays, t1.TransactionDate)
AND ABS(t2.TransactionAmount) > #ThresholdAmount
) AS NumberWithinXDays
FROM Transactions t1
WHERE t1.TransactionDate BETWEEN #BeginDate AND #EndDate
AND ABS(t1.TransactionAmount) > #ThresholdAmount
) t
WHERE t.NumberWithinXDays >= #ThresholdCount;
SELECT *
FROM Transactions t
WHERE EXISTS (
SELECT *
FROM (
SELECT t1.AccountNumber
FROM Transactions t1
INNER JOIN Transactions t2 ON t1.AccountNumber = t2.AccountNumber
AND t1.TransactionID <> t2.TransactionID
AND DATEDIFF(DAY, t1.TransactionDate, t2.TransactionDate) BETWEEN 0 AND (#NumberOfDays-1)
WHERE t1.TransactionDate BETWEEN #BeginDate AND #EndDate
AND t2.TransactionDate BETWEEN #BeginDate AND #EndDate
AND ABS(t1.TransactionAmount) > #ThresholdAmount
AND ABS(t2.TransactionAmount) > #ThresholdAmount
GROUP BY t1.AccountNumber
HAVING COUNT(t1.TransactionID) >= #ThresholdCount
) x
WHERE x.AccountNumber = t.AccountNumber
)
AND t.TransactionDate BETWEEN #BeginDate AND #EndDate
AND ABS(t.TransactionAmount) > #ThresholdAmount
My first query comes back with...
TransactionID AccountNumber TransactionDate TransactionAmount NumberWithinXDays
------------- ------------- --------------- ----------------- -----------------
5 0123 2020-09-06 -16.23 3
6 0123 2020-09-07 14.71 3
Not even close. And the second query returns...
TransactionID AccountNumber TransactionDate TransactionAmount
------------- ------------- --------------- -----------------
14 9876 2020-09-01 -45.75
16 9876 2020-09-05 19.79
17 9876 2020-09-05 -11.64
18 9876 2020-09-06 12.42
Closer, but not restricted to just transaction within 3 days of each other. This is the result I want.
TransactionID AccountNumber TransactionDate TransactionAmount
------------- ------------- --------------- -----------------
16 9876 2020-09-05 19.79
17 9876 2020-09-05 -11.64
18 9876 2020-09-06 12.42
Now it is certainly possible I have not implemented these suggested queries correctly. Or maybe there is some subtle difference I am missing and they just don't fit my situation.
Any suggestions on fixing either of my attempted queries or something completely different with or without window functions?
Here is full dbfiddle of my code.

I was not able to come up with a solution using window functions. As I thought about it more I thought I might be able to use a CTE, but I could not figure that out either.
I solve it using a couple of subqueries. I was concerned about performance given my transaction table has 86 million rows. However, it runs in less than 30 seconds and that is good enough for me.
-- distinct is need because a particular transaction may fit into more than
-- one transaction window but we only want to see it once in the results
SELECT DISTINCT
t.TransactionID
, t.AccountNumber
, t.TransactionDate
, t.TransactionAmount
FROM (
SELECT
t1.AccountNumber
, t1.TransactionDateWindowBegin
, t1.TransactionDateWindowEnd
, COUNT(DISTINCT t2.TransactionID) AS Count
FROM (
-- establish the transaction window for each transaction within the
-- larger date range and an absolute value above the threshold
SELECT
TransactionID
, AccountNumber
, TransactionDate AS [TransactionDateWindowBegin]
, DATEADD(DAY, #NumberOfDays - 1, TransactionDate) AS [TransactionDateWindowEnd]
, TransactionAmount
FROM Transactions
WHERE TransactionDate BETWEEN #BeginDate AND #EndDate
AND ABS(TransactionAmount) > #ThresholdAmount
) t1
-- join back to the transaction table to find transactions within the transaction window for
-- each transaction, count them, and only keep those that are above the threshold count
INNER JOIN Transactions t2 ON t1.AccountNumber = t2.AccountNumber
AND t1.TransactionDateWindowBegin <= t2.TransactionDate
AND t1.TransactionDateWindowEnd >= t2.TransactionDate
WHERE t2.TransactionDate BETWEEN #BeginDate AND #EndDate
AND ABS(t2.TransactionAmount) > #ThresholdAmount
GROUP BY t1.AccountNumber
, t1.TransactionDateWindowBegin
, t1.TransactionDateWindowEnd
HAVING COUNT(DISTINCT t2.TransactionID) >= #ThresholdCount
) x
-- join back to the transaction table again to get the details for the
-- transactions that meet the threshold amount and count criteria
INNER JOIN Transactions t ON x.AccountNumber = t.AccountNumber
AND x.TransactionDateWindowBegin <= t.TransactionDate
AND x.TransactionDateWindowEnd >= t.TransactionDate
AND ABS(t.TransactionAmount) > #ThresholdAmount;
Here is the full demo.

SQL Server - Renumber in Order

I have a table that I need to reorder a column, but I need to keep the original order by date.
TABLE_1
id num_seq DateTimeStamp
fb4e1683-7035-4895-b2c8-d084d9b42ce3 111 08-02-2005
e40e4c3e-65e4-47b7-b13a-79e8bce2d02d 114 10-07-2017
49e261a8-a855-4844-a0ac-37b313da2222 113 01-30-2010
6c4bffb7-a056-4a20-ae1c-5a31bdf683f2 112 04-15-2006
I want to reorder num_seq starting with 1001 through 1004 and keep the numbering in order. So 111 = 1001 and 112 = 1002 and so forth.
This is what I have so far:
DECLARE #num INT
SET #num = 0
UPDATE Table_1
SET #num = num_seq = #id + 1
GO
I know that UPDATE doesn't let me use the keyword ORDER BY. Is there a way to do this in SQL 2008 R2?

Stage the new num_seq in a CTE, then leverage that in your update statement:
declare #Table_1 table (id uniqueidentifier, num_seq int, DateTimeStamp datetime);
insert into #Table_1
values
('fb4e1683-7035-4895-b2c8-d084d9b42ce3', 111, '08-02-2005'),
('e40e4c3e-65e4-47b7-b13a-79e8bce2d02d', 114, '10-07-2017'),
('49e261a8-a855-4844-a0ac-37b313da2222', 113, '01-30-2010'),
('6c4bffb7-a056-4a20-ae1c-5a31bdf683f2', 112, '04-15-2006');
;with stage as
(
select *,
num_seq_new = 1000 + row_number()over(order by DateTimeStamp asc)
from #Table_1
)
update stage
set num_seq = num_seq_new;
select * from #Table_1
Returns:
id num_seq DateTimeStamp
FB4E1683-7035-4895-B2C8-D084D9B42CE3 1001 2005-08-02 00:00:00.000
E40E4C3E-65E4-47B7-B13A-79E8BCE2D02D 1004 2017-10-07 00:00:00.000
49E261A8-A855-4844-A0AC-37B313DA2222 1003 2010-01-30 00:00:00.000
6C4BFFB7-A056-4A20-AE1C-5A31BDF683F2 1002 2006-04-15 00:00:00.000

Postgresql Query for display of records every 45 days

I have a table that has data of user_id and the timestamp they joined.
If I need to display the data month-wise I could just use:
select
count(user_id),
date_trunc('month',(to_timestamp(users.timestamp))::timestamp)::date
from
users
group by 2
The date_trunc code allows to use 'second', 'day', 'week' etc. Hence I could get data grouped by such periods.
How do I get data grouped by "n-day" period say 45 days ?
Basically I need to display number users per 45 day period.
Any suggestion or guidance appreciated!
Currently I get:
Date Users
2015-03-01 47
2015-04-01 72
2015-05-01 123
2015-06-01 132
2015-07-01 136
2015-08-01 166
2015-09-01 129
2015-10-01 189
I would like the data to come in 45 days interval. Something like :-
Date Users
2015-03-01 85
2015-04-15 157
2015-05-30 192
2015-07-14 229
2015-08-28 210
2015-10-12 294
UPDATE:
I used the following to get the output, but one problem remains. I'm getting values that are offset.
with
new_window as (
select
generate_series as cohort
, lag(generate_series, 1) over () as cohort_lag
from
(
select
*
from
generate_series('2015-03-01'::date, '2016-01-01', '45 day')
)
t
)
select
--cohort
cohort_lag -- This worked. !!!
, count(*)
from
new_window
join users on
user_timestamp <= cohort
and user_timestamp > cohort_lag
group by 1
order by 1
But the output I am getting is:
Date Users
2015-04-15 85
2015-05-30 157
2015-07-14 193
2015-08-28 225
2015-10-12 210
Basically The users displayed at 2015-03-01 should be the users between 2015-03-01 and 2015-04-15 and so on.
But I seem to be getting values of users upto a date. ie: upto 2015-04-15 users 85. which is not the results I want.
Any help here ?

Try this query :
SELECT to_char(i::date,'YYYY-MM-DD') as date, 0 as users
FROM generate_series('2015-03-01', '2015-11-30','45 day'::interval) as i;
OUTPUT :
date users
2015-03-01 0
2015-04-15 0
2015-05-30 0
2015-07-14 0
2015-08-28 0
2015-10-12 0
2015-11-26 0

This looks like a hot mess, and it might be better wrapped in a function where you could use some variables, but would something like this work?
with number_of_intervals as (
select
min (timestamp)::date as first_date,
ceiling (extract (days from max (timestamp) - min (timestamp)) / 45)::int as num
from users
),
intervals as (
select
generate_series(0, num - 1, 1) int_start,
generate_series(1, num, 1) int_end
from number_of_intervals
),
date_spans as (
select
n.first_date + 45 * i.int_start as interval_start,
n.first_date + 45 * i.int_end as interval_end
from
number_of_intervals n
cross join intervals i
)
select
d.interval_start, count (*) as user_count
from
users u
join date_spans d on
u.timestamp >= d.interval_start and
u.timestamp < d.interval_end
group by
d.interval_start
order by
d.interval_start
With this sample data:
User Id timestamp derived range count
1 3/1/2015 3/1-4/15
2 3/26/2015 "
3 4/4/2015 "
4 4/6/2015 " (4)
5 5/6/2015 4/16-5/30
6 5/19/2015 " (2)
7 6/16/2015 5/31-7/14
8 6/27/2015 "
9 7/9/2015 " (3)
10 7/15/2015 7/15-8/28
11 8/8/2015 "
12 8/9/2015 "
13 8/22/2015 "
14 8/27/2015 " (5)
Here is the output:
2015-03-01 4
2015-04-15 2
2015-05-30 3
2015-07-14 5

T-SQL Query / Count Rows by Partial Date Grouping

I have a table (in SQL Server 2008 R2, in case that matters), defined with the following rows. The DateAdded column is a SmallDateTime data type.
ID DateAdded
1 2012-08-01 12:34:02
2 2012-08-01 12:48:25
3 2012-08-05 08:50:22
4 2012-08-05 11:32:01
5 2012-08-26 09:22:15
6 2012-08-26 13:42:02
7 2012-08-27 08:22:12
What I need to do is count the rows that occur on the same YYYY/MM/DD value. So the results I need to obtain would look like this...
DateAdded QTY
2012-08-01 2
2012-08-05 2
2012-08-26 2
2012-08-27 1
I can't figure out the syntax/expression to get this to work. Can someone point me in the right direction? Thank you!

SELECT
DateAdded = DATEADD(DAY, DATEDIFF(DAY, 0, DateAdded), 0),
QTY = COUNT(*)
FROM dbo.tablename
GROUP BY DATEADD(DAY, DATEDIFF(DAY, 0, DateAdded), 0);
Or as Marc rightly pointed out (I spent more time looking at the formatting botches than the tags):
SELECT
DateAdded = CONVERT(DATE, DateAdded),
QTY = COUNT(*)
FROM dbo.tablename
GROUP BY CONVERT(DATE, DateAdded);

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

SQL Exclude Field from GROUP BY in results but use in WHERE - tsql

use a sub query.... like this: select Submitter_CH_ID, Submitter_Last_Name, count(ID) AS [Total Submissions] from ( select ID, Submitter_CH_ID, Submitter_Last_Name from dbo.Recognitions where date >= #start_date and date <= #end_date ) T GROUP BY Submitter_CH_ID, Submitter_Last_Name yay sub-queries!

Related

Window Function For Consecutive Dates

Find Accounts with X Number of Transactions within Y Days of Each Other in a Larger Date Range

SQL Server - Renumber in Order

Postgresql Query for display of records every 45 days

T-SQL Query / Count Rows by Partial Date Grouping

Categories

Resources