How to merge or combine results from a query - tsql

I have the a query that returns count of items sold from current and prior month from different states.
States Prior Month Current Month
1 CA 1 2
2 NY 8 4
How would I go about merging/combining the two resulting row into something like this:
States Prior Month Current Month
1 Other States 9 4
Or is this something I have to do during the query? Any information on how I should tackle this problem would be nice.

Assuming that your current query is correct and does what you want you can wrap it up in an outer SELECT like this
SELECT 1,
'Other States' States,
SUM([Prior Month]) 'Prior Month',
SUM([Current Month]) 'Current Month'
FROM
(
-- Put your current query here
) t

Related

How can I always get the full period when grouping by week in PostgreSQL?

I'm used to do the following syntax when analysing weekly data:
select week(creation_date)::date as week,
count(*) as n
from table_1
where creation_date > current_date - 30
group by 1
However, by doing this I will get just part of the first week.
Is there any smart way to alway get a whole week in the beginning?
Like get the first day of the week I would get half of.
First off you need to define what you mean by "week". This is more difficult than it appears. While humans have an intuitive since of a week, computers are just not that smart. There are 2 common conventions: the ISO-8601 Standard and, for lack of a better term, Traditional. ISO-8601 defines a week as always beginning on Monday and always containing 7 days. Traditional weeks begin on Sunday (usually) but may have weeks with less than 7 days. This results from having the 1st week of the year beginning on 1-Jan regardless of day of week. Thus the 1st and/or last weeks may have less than 7 days. ISO-8601 throws it own curve into the mix: the 1st week of the year begins on the week containing 4-Jan. Thus the last days of Dec may be in week 1 of the next year and the first days Jan may be in week 52/53 of the prior year.
All the below assume the ISO-8061.
Secondly there is no week function in Postgres. In you need extract function. So for this particular case:
select extract(week from creation_date)::integer as week, ...
Finally, your predicate (current_date - 30) ensures you will unusually not begin on the 1st of the week. To get the correct date take that result back 1 week, then go forward to the next Monday.
with days_to_monday (day_adj) as
( values ('{7,6,5,4,3,2,1}'::int[]) )
select current_date - 30
, current_date - 30 - 7 + day_adj[extract (isodow from current_date - 30 )]
from table_1 cross join days_to_monday;
The CTE establishes an array which for a given day of the week contains the number of days need to the next Monday. That main query extracts the day of week of current date and uses that to index the array. The corresponding value is added to get the proper date.
Putting that together with your original query to arrive at:
with next_week (monday) as
( values (current_date - 30 - 7
+ ('{7,6,5,4,3,2,1}'::int[])[extract (isodow from current_date - 30 )])
)
select extract(week from creation_date) as week,
count(*) as n
from table_1
where creation_date >= (select monday from next_week)
group by 1
order by 1;
For full example see fiddle.

How to group previously-denormalized-data from a row

I have a table containing courses run by teachers, I want to grab the number of taught days and split these by years and teachers' status.
The table contains the following fields:
id teacher_id course_name course_date course_duration teacher_status
--------------------------------------------------------------------------
1 Teacher_01 Course_AA 2012-02-01 2 volunteer
2 Teacher_02 Course_BB 2012-02-01 7 employee
3 Teacher_03 Course_BB 2013-02-01 7 contractor
4 Teacher_01 Course_AA 2014-02-01 2 paid volunteer
5 Teacher_04 Course_AA 2014-06-01 2 paid volunteer
Teachers may run a course under various statuses: volunteer, paid volunteer, contractor, employee, etc. The status of a given teacher can change through time. The duration of a course is expressed in days.
I can already gather the sum of taught days by teachers, split by status. This is done by
SELECT
teacher_status,
sum(course_duration) AS "Taught days"
FROM
my_table
GROUP BY
teacher_status
;
But data is not normalized and different families of statuses have been mixed. So I want to gather the same info (number of taught days) split:
by 3 statuses: volunteer, paid volunteer, all other statuses,
and by years.
What is expected is:
Year Teacher_status Taught_days
---------------------------------------
2012 volunteer 2
2012 employee 7
2013 contractor 7
2014 paid volunteer 4
I've tried various combinations of aggregate functions, GROUP BY / HAVING / ROLLUP statements but without success. How should I achieve this?
You'll want to select a complex expression and then GROUP BY that, not just by a raw column value. You could either repeat the expression or, in Postgres, also refer to the column alias:
SELECT
EXTRACT(year FROM course_date) as year,
(CASE teacher_status
WHEN 'volunteer' THEN 'volunteer'
WHEN 'paid volunteer' THEN 'paid'
ELSE 'other'
END) AS status,
SUM(course_duration) AS "Taught days"
FROM
my_table
GROUP BY
year,
status;
To get your example result, I have this query
SELECT extract (year from course_date),
teacher_status,
sum(course_duration) AS "Taught days"
FROM
my_table
GROUP BY
extract (year from course_date),
teacher_status;

Why are these two SQL date functions not providing the same answer

I've actually already solved the problem, but I'm trying to understand why the problem occurred because as far as I can see it has no reason to happen.
I have a rather large query that I run to prepare a table with some often used combinations. Generally, it only contains 2 years of data. Occasionally I will reconstruct it. While doing this I tweaked the query to add more information, but suddenly the result no longer matched up to the old query. Comparing the old to the new I noticed several missing orders. Amazingly, even after removing the tweaked parts the results still didn't match up.
I ultimately tracked the problem down to my WHERE clause, which was different from how I did it last time.
The type of the orderdate column I go over has type (datetime, null)
One of the orders that was omitted had this as date:
2018-12-23 20:58:52.383
An order that was included had this as date:
2019-01-28 15:20:49.107
It looks exactly the same to me.
The entire query is the same, except for the WHERE clause. My original where was:
WHERE DATEPART(yyyy,tbOrder.[OrderDate]) >= DATEPART(yyyy,GETDATE()-2)
My new where is now:
WHERE tborder.[OrderDate] >= DATEADD(yy, DATEDIFF(yy, 0, GETDATE())-2, 0)
Any help in understanding why the original where clause drops some lines would be greatly appreciated.
Because you are doing two different things. First predicate,
WHERE DATEPART(yyyy,tbOrder.[OrderDate]) >= DATEPART(yyyy,GETDATE()-2)
Take all order dates that are bigger than the year for the day before yesterday or two days before. Notice that, -2 is inside the brackets.
Second predicate,
WHERE tborder.[OrderDate] >= DATEADD( yy, DATEDIFF( yy, 0, GETDATE() ) - 2, 0)
Take all order dates bigger than two years before, i.e., datediff(yy,startdate,enddate) will return the year result of the difference between today and the initial value for date datatype, which is 1900-01-01. Then, add this, -2, to 1900-01-01. The second expression is the form of:
1900 + ( 201X - 1898 )
I simplified 1900 - 2 = 1898.
The two expressions return completely different things, so it shouldn't be a surprise the results are different. The first one returns the current year as a number (or the year of the day before yesterday to be precise). The second one returns January 1st two years ago.
You can put both expressions in a SELECT query to see what they return :
select DATEPART(yyyy,GETDATE()-2), DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) - 2, 0)
The result is :
2019 2017-01-01 00:00:00.000
Both expressions are more complex than they need to be. The first condition will also harm performance because DATEPART(yyyy,tbOrder.[OrderDate]) prevents the server from using any indexes that cover OrderDate.
The question doesn't explain what you actually want to return. If you wanted to return all rows in the current year you can use :
Where
OrderDate >=DATEFROMPARTS( YEAR(GETDATE()) ,1,1) and
OrderDate < DATEFROMPARTS( YEAR(GETDATE()) + 1,1,1)
The same can be used to find rows two years in the past :
Where
OrderDate >= DATEFROMPARTS( YEAR(GETDATE()) -2 ,1,1) and
OrderDate < DATEFROMPARTS(YEAR(GETDATE()) - 1,1,1)
All rows since January 1st two years ago :
Where OrderDate >= DATEFROMPARTS( YEAR(GETDATE()) -2 ,1,1)
All those queries can take advantage of indexes that cover OrderDate.
Date range queries become a lot easier if you use a Calendar table. A Calendar table is a table that contains eg 50 or 100 years' worth of dates with extra columns for month, month day, week number, day of week, quarter, semester, month and day names, holidays, business reprorint periods, formatted short, long dates etc.
This makes yearly, monthly or weekly queries as easy as joining with the Calendar table and filtering based on the month or period you want.
In this case, retrieving rows two yeas in the past would look like :
From Orders inner Join Calendar on OrderDate=Calendar.Date
Where Calendar.Year=YEAR(GETDATE())-2
That may not looks so impressive but what about Q2 two years ago?
From Orders inner Join Calendar on OrderDate=Calendar.Date
Where Calendar.Year=YEAR(GETDATE())-2 and Quarter=2
Two years ago, same quarter
From Orders inner Join Calendar on OrderDate=Calendar.Date
Where Calendar.Year=YEAR(GETDATE())-2 and Quarter=DATEPART(q,GETDATE())
Retrieving totals for the current quarter for the last two years :
SELECT Year,Quarter,SUM(Total) QuarterTotal
From Orders inner Join Calendar on OrderDate=Calendar.Date
Where Calendar.Year > YEAR(GETDATE())-2 and Quarter=DATEPART(q,GETDATE())
GROUP BY Calendar.Year

TABLEAU Calculating a Running DISTINCT COUNT on usernames for last 3 months

Issue:
Need to show RUNNING DISTINCT users per 3-month interval^^. (See goal table as reference). However, “COUNTD” does not help even after table calculation or “WINDOW_COUNT” or “WINDOW_SUM” function.
^^RUNNING DISTINCT user means DISTINCT users in a period of time (Jan - Mar, Feb – Apr, etc.). The COUNTD option only COUNT DISTINCT users in a window. This process should go over 3-month window to find the DISTINCT users.
Original Table
Date Username
1/1/2016 A
1/1/2016 B
1/2/2016 C
2/1/2016 A
2/1/2016 B
2/2/2016 B
3/1/2016 B
3/1/2016 C
3/2/2016 D
4/1/2016 A
4/1/2016 C
4/2/2016 D
4/3/2016 F
5/1/2016 D
5/2/2016 F
6/1/2016 D
6/2/2016 F
6/3/2016 G
6/4/2016 H
Goal Table
Tried Methods:
Step-by-step:
Tried to distribute the problem into steps, but due to columnar nature of tableau, I cannot successfully run COUNT or SUM (any aggregate command) on the LAST STEP of the solution.
STEP 0 Raw Data
This tables show the structure Data, as it is in the original table.
STEP 1 COUNT usernames by MONTH
The table show the count of users by month. You will notice because user B had 2 entries he is counted twice. In the next step we use DISTINCT COUNT to fix this issue.
STEP 2 DISTINCT COUNT by MONTH
Now we can see who all were present in a month, next step would be to see running DISTINCT COUNT by MONTH for 3 months
STEP 3 RUNNING DISTINCT COUNT for 3 months
Now we can see the SUM of DISTINCT COUNT of usernames for running 3 months. If you turn the MONTH INTERVAL to 1 from 3, you can see STEP 2 table.
LAST STEP Issue Step
GOAL: Need the GRAND TOTAL to be the SUM of MONTH column.
Request:
I want to calculate the SUM of '1' by MONTH. However, I am using WINDOW function and aggregating the data that gave me an Error.
WHAT I NEED
Jan Feb March April May Jun
3 3 4 5 5 6
WHAT I GOT
Jan Feb March April May Jun
1 1 1 1 1 1
My Output after tried methods: Attached twbx file. DISTINCT_count_running_v1
HELP taken:
https://community.tableau.com/thread/119179 ; Tried this method but stuck at last step
https://community.tableau.com/thread/122852 ; Used some parts of this solution
The way I approached the problem was identifying the minimum login date for each user and then using that date to count the distinct number of users. For example, I have data in this format. I created a calculated field called Min User Login Date as { FIXED [User]:MIN([Date])} and then did a CNTD(USER) on Min User Login Date to get the unique user count by date. If you want running total, then you can do quick table calculation on Running Total on CNTD(USER) field.
You need to put Month(date) and count(username) in the columns then you will get result what you expect.
See screen below

How to Group By Week Across Year Boundary in TSQL

I use tsql to sum weekly data (starting date is Monday).
Last week spanned across the boundary of 2012 and 2013, which means that I get two weekly sums.
Is there a way to group the data within tsql so I get one weekly sum for 2012-12-31 to 2013-01-06?
Example data and sql:
http://sqlfiddle.com/#!6/52b25/13/0
No link example below. Hypothetical table has one column, "created" of datetime type. How many rows for each week? I'd like to get two result rows, one for the last solid week of 2012 and one for the week that 2013 starts in (which starts on Monday 12/31/2012).
select
min(created) as 'First Date',
count(*) as 'Count',
datepart(wk,created) as 'Week Number'
from items
group by datepart(wk,created), datepart(yy,created)
order by min(created)
One workaround is calculate week number manually
select
min(created) as 'First Date',
count(*) as 'Count',
datediff(dd,'2012-01-02',created)/7 AS week_number
from items
group by datediff(dd,'2012-01-02',created)/7 -- You may replace '2012-01-02' to any date that is Monday (start day of the week)
order by min(created)