T-SQL Query with SUM, DATEADD logic - tsql

I’m looking for a query that can calculate the Calculation table from the Value table.
The query had to take the Date value, like ‘2017-03-01’ but I need to take the values from the records 2 months before, but it must be the record with the same ID. In this scenario It must take the values from 2017-03-01 , 2017-02-01, 2017-01-01 (993, 492, 312) and sum together (1,797) and store it in the 2017-03-01 record like below from the customer where CustomerID = 1001.
|1001 | 2017-02-01 | 492 | |
|1001 | 2017-03-01 | 993 | |
|1002 | 2017-01-01 | 838 | 1797 |
This need to be done of all records.
Of course, some record cannot go back minus 2 months, but those values can stay null.
I really don’t know to write this query.
Got some test queries to make some steps like:
SELECT SUM(Value) FROM TestTable WHERE Date BETWEEN Date AND DATEADD(month, -2, Date);
+------------+------------+-------+-------------+
| CustomerID | Date | Value | Calculation |
+------------+------------+-------+-------------+
| 1001 | 2016-08-01 | 123 | |
| 1001 | 2016-09-01 | 434 | |
| 1001 | 2016-10-01 | 423 | |
| 1001 | 2016-11-01 | 235 | |
| 1001 | 2016-12-01 | 432 | |
| 1001 | 2017-01-01 | 312 | |
| 1001 | 2017-02-01 | 492 | |
| 1001 | 2017-03-01 | 993 | |
| 1002 | 2017-01-01 | 838 | |
| 1002 | 2017-02-01 | 234 | |
| 1002 | 2017-03-01 | 453 | |
| 1002 | 2017-04-01 | 838 | |
| 1003 | 2017-01-01 | 746 | |
| 1003 | 2017-02-01 | 242 | |
| 1003 | 2017-03-01 | 432 | |
| 1004 | 2017-01-01 | 431 | |
| 1004 | 2017-02-01 | 113 | |
+------------+------------+-------+-------------+
I want my table like below
+------------+------------+-------+-------------+
| CustomerID | Date | Value | Calculation |
+------------+------------+-------+-------------+
| 1001 | 2016-08-01 | 123 | NULL |
| 1001 | 2016-09-01 | 434 | NULL |
| 1001 | 2016-10-01 | 423 | 980 |
| 1001 | 2016-11-01 | 235 | 1092 |
| 1001 | 2016-12-01 | 432 | 1090 |
| 1001 | 2017-01-01 | 312 | 979 |
| 1001 | 2017-02-01 | 492 | 1236 |
| 1001 | 2017-03-01 | 993 | 1797 |
| 1002 | 2017-01-01 | 838 | NULL |
| 1002 | 2017-02-01 | 234 | NULL |
| 1002 | 2017-03-01 | 453 | 1525 |
| 1002 | 2017-04-01 | 838 | 1525 |
| 1003 | 2017-01-01 | 746 | NULL |
| 1003 | 2017-02-01 | 242 | NULL |
| 1003 | 2017-03-01 | 432 | 1420 |
| 1004 | 2017-01-01 | 431 | NULL |
| 1004 | 2017-02-01 | 113 | NULL |
+------------+------------+-------+-------------+
I hope you can help me with this! 😉

--First Create Table
create table Testtable(
CustomerID int,
Date date,
Value int
)
--Insert test values
insert into Testtable VALUES(1001,'2016-08-01',123),
(1001,'2016-09-01',434),
(1001,'2016-10-01',423),
(1001,'2016-11-01',235),
(1001,'2016-12-01',432),
(1001,'2017-01-01',312),
(1001,'2017-02-01',492),
(1001,'2017-03-01',993),
(1002,'2017-01-01',838),
(1002,'2017-02-01',234),
(1002,'2017-03-01',453),
(1002,'2017-04-01',838),
(1003,'2017-01-01',746),
(1003,'2017-02-01',242),
(1003,'2017-03-01',432),
(1004,'2017-01-01',431),
(1004,'2017-02-01',113);
--Select Query
SELECT
CustomerID,
Date,
Value,
CASE WHEN (SELECT COUNT(*) FROM Testtable T4 WHERE T4.CustomerID = T3.CustomerID AND T4.Date < T3.Date) < 2 THEN NULL
ELSE Calculation END AS Calculation
FROM
(SELECT
*,
(SELECT SUM(T2.Value) FROM Testtable T2 WHERE T.CustomerID = T2.CustomerID AND T2.Date BETWEEN DATEADD(month,-2,T.Date) AND T.Date) AS Calculation
FROM Testtable T) AS T3

This might take some trial and error to get completely correct but I'll give it a shot, try below:
SELECT CustomerID
, Date
, Value
, Value +
(SELECT Value from table_name where CustomerID = x.CustomerID and Date =
DATEADD(m,-1,x.Date)) +
(SELECT Value from table_name where CustomerID = x.CustomerID and Date =
DATEADD(m,-2,x.Date)) as Calculation
FROM table_name x
Note that this will only work it the CustomerID/Date are a composite key in your table.
Hope this helps!

Related

R, Group By in Subquery

I was practicing on some subqueries and I got stuck on a problem. This is for the table below (snippet). The question is "From the following tables, write a SQL query to find those employees whose salaries exceed 50% of their department's total salary bill. Return first name, last name."
My query is this below, but it does not run. I ran the subquery by itself, and it ran fine. I think it's something to do with the GROUP BY in the subquery.
SELECT first_name, last_name
FROM employees
WHERE salary >
(
SELECT (sum(salary)) / 2
FROM employees
GROUP BY department_id
)
The correct answer from the practice is below. Is creating table e2 necessary?
SELECT e1.first_name, e1.last_name
FROM employees e1
WHERE salary >
( SELECT (SUM(salary))*.5
FROM employees e2
WHERE e1.department_id=e2.department_id);
+-------------+-------------+-------------+----------+--------------------+------------+------------+----------+----------------+------------+---------------+
| EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | HIRE_DATE | JOB_ID | SALARY | COMMISSION_PCT | MANAGER_ID | DEPARTMENT_ID |
+-------------+-------------+-------------+----------+--------------------+------------+------------+----------+----------------+------------+---------------+
| 100 | Steven | King | SKING | 515.123.4567 | 2003-06-17 | AD_PRES | 24000.00 | 0.00 | 0 | 90 |
| 101 | Neena | Kochhar | NKOCHHAR | 515.123.4568 | 2005-09-21 | AD_VP | 17000.00 | 0.00 | 100 | 90 |
| 102 | Lex | De Haan | LDEHAAN | 515.123.4569 | 2001-01-13 | AD_VP | 17000.00 | 0.00 | 100 | 90 |
| 103 | Alexander | Hunold | AHUNOLD | 590.423.4567 | 2006-01-03 | IT_PROG | 9000.00 | 0.00 | 102 | 60 |
| 104 | Bruce | Ernst | BERNST | 590.423.4568 | 2007-05-21 | IT_PROG | 6000.00 | 0.00 | 103 | 60 |
| 105 | David | Austin | DAUSTIN | 590.423.4569 | 2005-06-25 | IT_PROG | 4800.00 | 0.00 | 103 | 60 |
| 106 | Valli | Pataballa | VPATABAL | 590.423.4560 | 2006-02-05 | IT_PROG | 4800.00 | 0.00 | 103 | 60 |
| 107 | Diana | Lorentz | DLORENTZ | 590.423.5567 | 2007-02-07 | IT_PROG | 4200.00 | 0.00 | 103 | 60 |
| 108 | Nancy | Greenberg | NGREENBE | 515.124.4569 | 2002-08-17 | FI_MGR | 12008.00 | 0.00 | 101 | 100 |
| 109 | Daniel | Faviet | DFAVIET | 515.124.4169 | 2002-08-16 | FI_ACCOUNT | 9000.00 | 0.00 | 108 | 100 |
| 110 | John | Chen | JCHEN | 515.124.4269 | 2005-09-28 | FI_ACCOUNT | 8200.00 | 0.00 | 108 | 100
SELECT first_name, last_name
FROM employees
WHERE salary >
(
SELECT (sum(salary)) / 2
FROM employees
GROUP BY department_id
)
I expected this to run, but it did not execute. The editor on the website I'm practicing from does not give error info.

Return unique grouped rows with the latest timestamp [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 3 years ago.
At the moment I'm struggling with a problem that looks very easy.
Tablecontent:
Primay Keys: Timestamp, COL_A,COL_B ,COL_C,COL_D
+------------------+-------+-------+-------+-------+--------+--------+
| Timestamp | COL_A | COL_B | COL_C | COL_D | Data_A | Data_B |
+------------------+-------+-------+-------+-------+--------+--------+
| 31.07.2019 15:12 | - | - | - | - | 1 | 2 |
| 31.07.2019 15:32 | 1 | 1 | 100 | 1 | 5000 | 20 |
| 10.08.2019 09:33 | - | - | - | - | 1000 | 7 |
| 31.07.2019 15:38 | 1 | 1 | 100 | 1 | 33 | 5 |
| 06.08.2019 08:53 | - | - | - | - | 0 | 7 |
| 06.08.2019 09:08 | - | - | - | - | 0 | 7 |
| 06.08.2019 16:06 | 3 | 3 | 3 | 3 | 0 | 23 |
| 07.08.2019 10:43 | - | - | - | - | 0 | 42 |
| 07.08.2019 13:10 | - | - | - | - | 0 | 24 |
| 08.08.2019 07:19 | 11 | 111 | 111 | 12 | 0 | 2 |
| 08.08.2019 10:54 | 2334 | 65464 | 565 | 76 | 1000 | 19 |
| 08.08.2019 11:15 | 232 | 343 | 343 | 43 | 0 | 2 |
| 08.08.2019 11:30 | 2323 | rtttt | 3434 | 34 | 0 | 2 |
| 10.08.2019 14:47 | - | - | - | - | 123 | 23 |
+------------------+-------+-------+-------+-------+--------+--------+
Needed query output:
+------------------+-------+-------+-------+-------+--------+--------+
| Timestamp | COL_A | COL_B | COL_C | COL_D | Data_A | Data_B |
+------------------+-------+-------+-------+-------+--------+--------+
| 31.07.2019 15:38 | 1 | 1 | 100 | 1 | 33 | 5 |
| 06.08.2019 16:06 | 3 | 3 | 3 | 3 | 0 | 23 |
| 08.08.2019 07:19 | 11 | 111 | 111 | 12 | 0 | 2 |
| 08.08.2019 10:54 | 2334 | 65464 | 565 | 76 | 1000 | 19 |
| 08.08.2019 11:15 | 232 | 343 | 343 | 43 | 0 | 2 |
| 08.08.2019 11:30 | 2323 | rtttt | 3434 | 34 | 0 | 2 |
| 10.08.2019 14:47 | - | - | - | - | 123 | 23 |
+------------------+-------+-------+-------+-------+--------+--------+
As you can see, I'm trying to get single rows for my primary keys, using the latest timestamp, which is also a primary key.
Currently, I tried a query like:
SELECT Timestamp, COL_A, COL_B, COL_C, COL_D, Data_A, Data_B From Table XY op
WHERE Timestamp = (
SELECT MAX(Timestamp) FROM XY as tsRow
WHERE op.COL_A = tsRow.COL_A
AND op.COL_B = tsRow.COL_B
AND op.COL_C = tsRow.COL_C
AND op.COL_D = tsRow."COL_D
);
which gives me result that looks fine at first glance.
Is there a better or more safe way to get my preferred result?
demo:db<>fiddle
You can use the DISTINCT ON clause, which gives you the first record of an ordered group. Here your group is your (A, B, C, D). This is ordered by the Timestamp column, in descending order, to get the most recent record to be the first.
SELECT DISTINCT ON ("COL_A", "COL_B", "COL_C", "COL_D")
*
FROM
mytable
ORDER BY "COL_A", "COL_B", "COL_C", "COL_D", "Timestamp" DESC
If you want to get your expected order, you need a second ORDER BY after this operation:
SELECT
*
FROM (
SELECT DISTINCT ON ("COL_A", "COL_B", "COL_C", "COL_D")
*
FROM
mytable
ORDER BY "COL_A", "COL_B", "COL_C", "COL_D", "Timestamp" DESC
) s
ORDER BY "Timestamp"
Note: If you have the Timestamp column as part of the PK, are you sure, you really need the four other columns as PK as well? It seems, that the TS column is already unique.

How to do sum of different values without duplicate

How to do a sum of different values but same ID without duplicate different values on a column?
My Input in SQL Command.
SELECT
students.id AS student_id,
students.name,
COUNT(*) AS enrolled,
c2.price AS course_price,
(COUNT(*) * price) AS paid
FROM students
LEFT JOIN enrolls e on students.id = e.student_id
LEFT JOIN courses c2 on e.course_id = c2.id
WHERE student_id NOTNULL
GROUP BY students.id, students.name, c2.price
ORDER BY student_id ASC;
My result.
student_id | name | enrolled | paid
------------+---------------------+----------+------
1001 | Gulbadan Bálint | 1 | 90
1002 | Hanna Adair | 5 | 450
1003 | Taddeo Bhattacharya | 1 | 90
1004 | Persis Havlíček | 1 | 75
1004 | Persis Havlíček | 5 | 450
1005 | Tory Bateson | 1 | 90
1007 | Dávid Fèvre | 1 | 90
1008 | Masuyo Stoddard | 1 | 90
1009 | Iiris Levitt | 1 | 75
1009 | Iiris Levitt | 2 | 180
1013 | Artair Kovač | 1 | 30
1013 | Artair Kovač | 1 | 90
1015 | Matilda Guinness | 2 | 180
1017 | Margarita Ek | 1 | 90
1018 | Misti Zima | 3 | 270
1019 | Conall Ventura | 1 | 90
1020 | Vivian Monday | 2 | 180
My expected result.
student_id | name | enrolled | paid
------------+---------------------+----------+------
1001 | Gulbadan Bálint | 1 | 90
1002 | Hanna Adair | 5 | 450
1003 | Taddeo Bhattacharya | 1 | 90
1004 | Persis Havlíček | 6 | 525
1005 | Tory Bateson | 1 | 90
1007 | Dávid Fèvre | 1 | 90
1008 | Masuyo Stoddard | 1 | 90
1009 | Iiris Levitt | 3 | 255
1013 | Artair Kovač | 2 | 120
1015 | Matilda Guinness | 2 | 180
1017 | Margarita Ek | 1 | 90
1018 | Misti Zima | 3 | 270
1019 | Conall Ventura | 1 | 90
1020 | Vivian Monday | 2 | 180
I think that the cause come from a GROUP BY command but it will throw an error if I do not write a GROUP BY price.
Perhaps you can use SUM() function.
Please see link below, maybe it's same case with you:
how to group by and return sum row in Postgres
You have excluded course_price column both in your current and expected result. It seems you had wrongly included that in group by.
SELECT
students.id AS student_id,
students.name,
COUNT(*) AS enrolled,
--c2.price AS course_price, --exclude this in o/p?
(COUNT(*) * price) AS paid
FROM students
LEFT JOIN enrolls e on students.id = e.student_id
LEFT JOIN courses c2 on e.course_id = c2.id
WHERE student_id NOTNULL
GROUP BY students.id, students.name --,c2.price --and remove it from here
ORDER BY student_id ASC;

If a users record on column x is not null how do I count how many records that user has after the first time it is not null?

I would like to create a count per user of number of records after the first time that x is not null for that user.
I have a table that is similar to the following:
id | user_id | completed_at | x
----+---------+--------------+---
1 | 1001 | 2017-06-01 | 1
20 | 1001 | 2017-06-01 | 2
21 | 1001 | 2017-06-02 | 4
22 | 1001 | 2017-06-03 |
24 | 1001 | 2017-06-03 |
25 | 1001 | 2017-06-04 |
23 | 1001 | 2017-06-04 |
12 | 1001 | 2017-06-06 |
13 | 1001 | 2017-06-07 |
14 | 1001 | 2017-06-08 |
2 | 1002 | 2017-06-02 | 3
27 | 1002 | 2017-06-02 | 7
15 | 1002 | 2017-06-09 |
3 | 1003 | 2017-06-03 |
4 | 1004 | 2017-06-04 |
5 | 1005 | 2017-06-05 |
33 | 1005 | 2017-06-20 | 8
34 | 1006 | 2017-07-10 | 9
6 | 1006 | 2017-10-06 |
7 | 1007 | 2017-10-07 |
8 | 1008 | 2017-10-08 |
9 | 1009 | 2017-10-09 |
10 | 1010 | 2017-10-10 |
16 | 1011 | 2017-06-01 |
11 | 1011 | 2017-07-01 | 5
17 | 1012 | 2017-06-02 |
26 | 1012 | 2017-07-02 | 6
18 | 1013 | 2017-06-03 |
19 | 1014 | 2017-06-04 |
31 | 1014 | 2017-06-24 |
32 | 1014 | 2017-06-24 |
30 | 1014 | 2017-06-24 |
29 | 1014 | 2017-06-24 |
28 | 1014 | 2017-06-24 |
The expected output would look like this:
+------+------------+---------------+
| user | first_x | records_after |
+------+------------+---------------+
| 1001 | 2017-06-01 | 9 |
| 1002 | 2017-06-02 | 2 |
| 1005 | 2017-06-20 | 0 |
| 1011 | 2017-07-01 | 0 |
| 1012 | 2017-07-02 | 0 |
+------+------------+---------------+
Using running count, and then conditional count for running count > 0
Sample
WITH flags AS (
SELECT
user_id,
completed_at,
sum(CASE WHEN x IS NULL THEN 0 ELSE 1 END) OVER (PARTITION BY user_id ORDER BY completed_at ROWS BETWEEN UNBOUNDED PRECEDING AND 0 FOLLOWING) AS flag
FROM users
),
completed AS (
SELECT DISTINCT ON (user_id)
user_id,
completed_at AS first_x
FROM flags
WHERE flag > 0
ORDER BY user_id, completed_at
)
SELECT DISTINCT
user_id AS user,
first_x,
count(flag) FILTER (WHERE flag>0) - 1 AS records_after
FROM flags
NATURAL JOIN completed
GROUP BY 1, 2
ORDER BY 1

I have a query that groups usage by user by day how would I add a running total to this query?

I have the following query:
SELECT
usersq1.id AS user_id, name, completed_at,
COUNT(usersq1.id) AS trips,
SUM(cost_amount_cents) AS daily_cost_amount_cents
FROM usersq1
LEFT OUTER JOIN tripsq1
ON usersq1.id = user_id
GROUP by usersq1.id, name, completed_at
ORDER by user_id, name, completed_at;
Which returns the following:
user_id | name | completed_at | trips | daily_cost_amount_cents
---------+---------------------+--------------+-------+-------------------------
1001 | Makeda Mosser | 2017-06-01 | 2 | 125
1001 | Makeda Mosser | 2017-06-02 | 1 | 125
1001 | Makeda Mosser | 2017-06-03 | 2 | 350
1001 | Makeda Mosser | 2017-06-04 | 2 | 200
1001 | Makeda Mosser | 2017-06-06 | 1 | 100
1001 | Makeda Mosser | 2017-06-07 | 1 | 125
1001 | Makeda Mosser | 2017-06-08 | 1 | 150
1002 | Libbie Luby | 2017-06-02 | 2 | 125
1002 | Libbie Luby | 2017-06-09 | 1 | 175
1003 | Linn Loughran | 2017-06-03 | 1 | 75
1004 | Natacha Ned | 2017-06-04 | 1 | 100
1005 | Lorrine Lunt | 2017-06-05 | 1 | 125
1006 | Tami Tineo | 2017-10-06 | 1 | 150
1007 | Delisa Deen | 2017-10-07 | 1 | 175
1008 | Mimi Miltenberger | 2017-10-08 | 1 | 200
1009 | Seth Sneller | 2017-10-09 | 1 | 25
1010 | Rickie Rossi | 2017-10-10 | 1 | 50
1011 | Jenise Jeanbaptiste | 2017-06-01 | 1 | 200
1011 | Jenise Jeanbaptiste | 2017-07-01 | 1 | 75
1012 | Genia Glatz | 2017-06-02 | 1 | 25
1012 | Genia Glatz | 2017-07-02 | 1 | 50
1013 | Onita Oddo | 2017-06-03 | 1 | 50
1014 | Dario Dreyer | 2017-06-04 | 1 | 75
1014 | Dario Dreyer | 2017-06-24 | 5 | 750
1015 | Toby Trent | | 1 |
I would like to produce another cumulative sum column which keeps a running total of daily_cost_amount_cents per user. The expected outlook I would like is something like this:
+---------+---------------------+------------+-------+-------------------------+-----------+
| user_id | name | created_at | trips | daily_cost_amount_cents | cum_cents |
+---------+---------------------+------------+-------+-------------------------+-----------+
| 1001 | Makeda Mosser | 6/1/17 | 2 | 125 | 125 |
| 1001 | Makeda Mosser | 6/2/17 | 1 | 125 | 250 |
| 1001 | Makeda Mosser | 6/3/17 | 2 | 350 | 600 |
| 1001 | Makeda Mosser | 6/4/17 | 2 | 200 | 800 |
| 1001 | Makeda Mosser | 6/6/17 | 1 | 100 | 900 |
| 1001 | Makeda Mosser | 6/7/17 | 1 | 125 | 1025 |
| 1001 | Makeda Mosser | 6/8/17 | 1 | 150 | 1175 |
| 1002 | Libbie Luby | 6/2/17 | 2 | 125 | 125 |
| 1002 | Libbie Luby | 6/9/17 | 1 | 175 | 300 |
| 1003 | Linn Loughran | 6/3/17 | 1 | 75 | 75 |
| 1004 | Natacha Ned | 6/4/17 | 1 | 100 | 100 |
| 1005 | Lorrine Lunt | 6/5/17 | 1 | 125 | 125 |
| 1006 | Tami Tineo | 10/6/17 | 1 | 150 | 150 |
| 1007 | Delisa Deen | 10/7/17 | 1 | 175 | 175 |
| 1008 | Mimi Miltenberger | 10/8/17 | 1 | 200 | 200 |
| 1009 | Seth Sneller | 10/9/17 | 1 | 25 | 25 |
| 1010 | Rickie Rossi | 10/10/17 | 1 | 50 | 50 |
| 1011 | Jenise Jeanbaptiste | 6/1/17 | 1 | 200 | 200 |
| 1011 | Jenise Jeanbaptiste | 7/1/17 | 1 | 75 | 275 |
| 1012 | Genia Glatz | 6/2/17 | 1 | 25 | 25 |
| 1012 | Genia Glatz | 7/2/17 | 1 | 50 | 75 |
| 1013 | Onita Oddo | 6/3/17 | 1 | 50 | 50 |
| 1014 | Dario Dreyer | 6/4/17 | 1 | 75 | 75 |
| 1014 | Dario Dreyer | 6/24/17 | 5 | 750 | 750 |
| 1015 | Toby Trent | | 0 | | |
+---------+---------------------+------------+-------+-------------------------+-----------+
I am pretty sure that I need to use a window function to do this but can't seem to do it while preserving the grouping by user_id and created_by
The problem is that in the presence of a GROUP BY clause, the window functions iterate over each group rather than multiple grouped rows. Put your query into a WITH clause and you can easily do the windowing you want:
WITH t AS (
SELECT usersq1.id AS user_id,
name,
completed_at,
COUNT(completed_at) AS trips, -- To correctly handle 0 trips
SUM(cost_amount_cents) AS daily_cost_amount_cents
FROM usersq1
LEFT OUTER JOIN tripsq1 ON usersq1.id = user_id
GROUP BY usersq1.id, name, completed_at
ORDER BY user_id, name, completed_at
) SELECT user_id,
name,
completed_at AS created_at,
trips,
daily_cost_amount_cents,
SUM(daily_cost_amount_cents) OVER (PARTITION BY user_id
ORDER BY user_id, completed_at)
FROM t;