GROUP BY in a JOIN? - tsql

I have a data table and a calendar table. The date table only contains the fields MonthName and FiscalYearName. My calendar table has each day for each year. How can I write the JOIN to make sure I am not getting duplicated data due to the inability to JOIN on a date? Right now am I just linking on those two fields and I am getting duplicate data.

Could you elaborate on the exact issue?
If month and year are the only fields, why exactly do you need to join on Calendar? To get rid of duplicates, the easiest solution might be to use a 'DISTINCT'.
SELECT DISTINCT dat.year, dat.month, *
FROM date dat
JOIN calendar cal
ON dat.year = cal.year
AND dat.month = cal.month
Edit;
Extra query to just show the MonthNum;
This query uses a common table expression which gets the distinct month names and their number only once, then joins the date table on this cte.
WITH cteCalendar AS
(
SELECT DISTINCT MonthName, MonthNum FROM calendar
)
SELECT cte.MonthNum, dat.*
FROM date dat
JOIN cteCalendar cte
AS dat.MonthName = cte.MonthName

Related

How to extend dynamic schema with views in Hasura and Postgres?

So I am trying and struggling for few days to extend the schema with the custom groupby using something like this
I have a table with few fields like id, country, ip, created_at.
Then I am trying to get them as groups. For example, group the data based on date, hourly of date, or based on country, and based on country with DISTINCT ip.
I am zero with SQLs honestly. But I tried to play around and get what I want. Here's an example.
SELECT Hour(created_at) AS date,
COUNT(*) AS count
FROM session where CAST(created_at AS date) = '2021-04-05'
GROUP BY Hour(created_at)
ORDER BY date;
SELECT country,
count(*) AS count from (SELECT * FROM session where CAST(created_at AS date) <= '2021-05-12' GROUP BY created_at) AS T1
GROUP BY country;
SELECT country, COUNT(*) as count
FROM (SELECT DISTINCT ip, country FROM session) AS T1
GROUP BY country;
SELECT DATE(created_at) AS date,
COUNT(*) AS count
FROM session
GROUP BY DATE(created_at)
ORDER BY date;
Now I am struggling with two things.
How do I make the date as variables? I mean, if I want to group them for a particular date range/ or today's data hourly, or per quarter gap (more of configurable), how do I add the variables in Hasura's Raw SQL?
Also for this approach I have to add schema for each one of them? Like this
CREATE
OR REPLACE VIEW "public"."unique_session_counts_date" AS
SELECT
date(session.created_at) AS date,
count(*) AS count
FROM
session
GROUP BY
(date(session.created_at))
ORDER BY
(date(session.created_at));
Is there a way to make it more generalized? What I mean is, if it
was in Nodejs I could have done something like
return rawQuery(
`
select ${field} x, count(*) y
from ${table}
where website_id=$1
and created_at between $2 and $3
${domainFilter}
${urlFilter}
group by 1
order by 2 desc
`,
params,
);
In this case, based on whatever field and where clause I send, one query would do the trick for me. Can do something similar in hasura?
Thank you so much in advance.
How do I make the date as variables? I mean, if I want to group them for a particular date range/ or today's data hourly, or per quarter gap (more of configurable), how do I add the variables in Hasura's Raw SQL?
My first thought is this. If you're thinking about passing in variables via a GraphQL for example, the GraphQL would look something like:
query MyQuery {
unique_session_counts_date(where: {created_at: {_gte: "<start date here>", _lte: "<end date here>"}}) {
<...any fields, rollups, etc here...>
}
}
The underlying view/query would follow the group by and order by that you've detailed. Then you'd be able to submit a query of the graphql query and just pass in the pertinent parameters like the $1, $2, and $3 in the raqQuery call.
Also for this approach I have to add schema for each one of them?
The schema? The view? I don't think a view specifically would be required, if a multilevel select or similar query can handle it and perform then a view wouldn't particularly be needed.
That's my first stab at the problem. I'm going to try to work through this problem in a few hours via a Twitch stream # HasuraHQ if you can join, happy to walk through it live.

Add dates ranges to a table for individual values using a cursor

I have a calendar table called CalendarInformation that gives me a list of dates from 2015 to 2025. This table has a column called BusinessDay that shows what dates are weekends or holidays. I have another table called OpenProblemtimeDiffTable with a column called number for my problem number and a date for when the problem was opened called ProblemNew and another date for the current column called Now. What I want to do is for each problem number grab its date ranges and find the dates between and then sum them up to give me the number of business days. Then I want to insert these values in another table with the problem number associated with the business day.
Thanks in advance and I hope I was clear.
TRUNCATE TABLE ProblemsMoreThan7BusinessDays
DECLARE #date AS date
DECLARE #businessday AS INT
DECLARE #Startdate as DATE, #EndDate as DATE
DECLARE CONTACT_CURSOR CURSOR FOR
SELECT date, businessday
FROM CalendarInformation
OPEN contact_cursor
FETCH NEXT FROM Contact_cursor INTO #date, #businessday
WHILE (##FETCH_STATUS=0)
BEGIN
SELECT #enddate= now FROM OpenProblemtimeDiffTable
SELECT #Startdate= problemnew FROM OpenProblemtimeDiffTable
SET #Date=#Startdate
PRINT #enddate
PRINT #startdate
SELECT #businessday= SUM (businessday) FROM CalendarInformation WHERE date > #startdate AND date <= #Enddate
INSERT INTO ProblemsMoreThan7BusinessDays (businessdays, number)
SELECT #businessday, number
FROM OpenProblemtimeDiffTable
FETCH NEXT FROM CONTACT_CURSOR INTO #date, #businessday
END
CLOSE CONTACT_CURSOR
DEALLOCATE CONTACT_CURSOR
I tried this code using a cursor and I'm close, but I cannot get the date ranges to change for each row.
So if I have a problemnumber with date ranges between 02-07-2018 and 05-20-2019, I would want in my new table the sum of business days from the calendar along with the problem number. So my output would be column number PROB0421 businessdays (with the correct sum). Then the next problem PRB0422 with date ranges of 11-6-18 to 5-20-19. So my output would be PROB0422 with the correct sum of business days.
Rather than doing this in with a cursor, you should approach this in a set based manner. That you already have a calendar table makes this a lot easier. The basic approach is to select from your data table and join into your calendar table to return all the rows in the calendar table that sit within your date range. From here you can then aggregate as you require.
This would look something like the below, though apply it to your situation and adjust as required:
select p.ProblemNow
,p.Now
,sum(c.BusinessDay) as BusinessDays
from dbo.Problems as p
join dbo.calendar as c
on c.CalendarDate between p.ProblemNow and p.Now
and c.BusinessDay = 1
group by p.ProblemNow
,p.Now
I think you can do this without a cursor. Should only require a single insert..select statement.
I assume your "businessday" column is just a bit or flag-type field that is 1 if the date is a business day and 0 if not? If so, this should work (or something close to it if I'm not understanding your environment properly).:
insert ProblemsMoreThan7BusinessDays
(
businessdays
, number
)
select
number
, sum( businessday ) -- or count(*)
from OpenProblemtimeDiffTable op
inner join CalendarInformation ci on op.problem_new >= ci.[date]
and op.[now] <= ci.[date]
and ci.businessday = 1
group by
problem_number
I usually try to avoid the use of cursors and working with data in a procedural manner, especially if I can handle the task as above. Dont think of the data as 1000's of individual rows, but think of the data as only two sets of data. How do they relate?

fetch data from and to date to get all matching results

Hello everyone I have to get data from and to date, I tried using between clause which fails to retrieve data what I need. Here is what I need.
I have table called hall_info which has following structure
hall_info
id | hall_name |address |contact_no
1 | abc | India |XXXX-XXXX-XX
2 | xyz | India |XXXX-XXXX-XX
Now I have one more table which is events, that contains data about when and which hall is booked on what date, the structure is as follows.
id |hall_info_id |event_date(booked_date)| event_name
1 | 2 | 2015-10-25 | Marriage
2 | 1 | 2015-10-28 | Marriage
3 | 2 | 2015-10-26 | Marriage
So what I need now is I wanna show hall_names that are not booked on selected dates, suppose if user chooses from 2015-10-23 to 2015-10-30 so I wanna list all halls that are not booked on selected dates. In above case both the halls of hall_info_id 1 and 2 ids booked in given range but still I wanna show them because they are free on 23,24,27 and on 29 date.
In second case suppose if user chooses date from 2015-10-25 and 2015-10-26 then only hall_info_id 2 is booked on both the dates 25 and 26 so in this case i wanna show only hall_info_id 1 as hall_info_id 2 is booked.
I tried using inner query and between clause but I am not getting required result to simply i have given only selected fields I have more tables to join so i cant paste my query please help with this. Thanks in advance for all who are trying.
Some changes in Yasen Zhelev's code:
SELECT * FROM hall_info
WHERE id not IN (
SELECT hall_info_id FROM events
WHERE event_date >= '2015-10-23' AND event_date <= '2015-10-30'
GROUP BY hall_info_id
HAVING COUNT(DISTINCT event_date) > DATE_PART('day', '2015-10-30'::timestamp - '2015-10-23'::timestamp))
I have not tried it but how about checking if the number of bookings per hall is less than the actual days in the selected period.
SELECT * FROM hall_info WHERE id NOT IN
(SELECT hall_info_id FROM events
WHERE event_date >= '2015-10-23' AND event_date <= '2015-10-30'
GROUP BY hall_info_id
HAVING COUNT(id) < DATEDIFF(day, '2015-10-30', '2015-10-23')
);
That will only work if you have one booking per day per hall.
To get the "available dates" for the hall returned, your query needs a row source of all possible dates. For example, if you had a calendar table populated with possible date values, e.g.
CREATE TABLE cal (dt DATE NOT NULL PRIMARY KEY) Engine=InnoDB
;
INSERT INTO cal (dt) VALUES ('2015-10-23')
,('2015-10-24'),('2015-10-25'),('2015-10-26'),('2015-10-27')
,('2015-10-28'),('2015-10-29'),('2015-10-30'),('2015-10-31')
;
The you could use a query that performs a cross join between the calendar table and hall_info... to get every hall on every date... and an anti-join pattern to eliminate rows that are already booked.
The anti-join pattern is an outer join with a restriction in the WHERE clause to eliminate matching rows.
For example:
SELECT cal.dt, h.id, h.hall_name, h.address
FROM cal cal
CROSS
JOIN hall_info h
LEFT
JOIN events e
ON e.hall_id = h.id
AND e.event_date = cal.dt
WHERE e.id IS NULL
AND cal.dt >= '2015-10-23'
AND cal.dt <= '2015-10-30'
The cross join between cal and hall_info gets all halls for all dates (restricted in the WHERE clause to a specified range of dates.)
The outer join to events find matching rows in the events table (matching on hall_id and event_date. The trick is the predicate (condition) in the WHERE clause e.id IS NULL. That throws out any rows that had a match, leaving only rows that don't have a match.
This type of problem is similar to other "sparse data" problems. e.g. How do you return a zero total for sales by a given store on a given date, when there are no rows with that store and date...
In your case, the query needs a source of rows with available date values. That doesn't necessarily have to be a table named calendar. (Other databases give us the ability to dynamically generate a row source; someday, MySQL may have similar features.)
If you want the row source to be dynamic in MySQL, then one approach would be to create a temporary table, and populate it with the dates, run the query referencing the temporary table, and then dropping the temporary table.
Another approach is to use an inline view to return the rows...
SELECT cal.dt, h.id, h.hall_name, h.address
FROM (
SELECT '2015-10-23'+INTERVAL 0 DAY AS dt
UNION ALL SELECT '2015-10-24'
UNION ALL SELECT '2015-10-25'
UNION ALL SELECT '2015-10-26'
UNION ALL SELECT '2015-10-27'
UNION ALL SELECT '2015-10-28'
UNION ALL SELECT '2015-10-29'
UNION ALL SELECT '2015-10-30'
) cal
CROSS
JOIN hall_info h
LEFT
JOIN events e
ON e.hall_id = h.id
AND e.event_date = c.dt
WHERE e.id IS NULL
FOLLOWUP: When this question was originally posted, it was tagged with mysql. The SQL in the examples above is for MySQL.
In terms of writing a query to return the specified results, the general issue is still the same in PostgreSQL. The general problem is "sparse data".
The SQL query needs a row source for the "missing" date values, but the specification doesn't provide any source for those date values.
The answer above discusses several possible row sources in MySQL: 1) a table, 2) a temporary table, 3) an inline view.
The answer also mentions that some databases (not MySQL) provide other mechanisms that can be used as a row source.
For example, PostgreSQL provides a nifty generate_series function (Reference: http://www.postgresql.org/docs/9.1/static/functions-srf.html.
It should be possible to use the generate_series function as a row source, to supply a set of rows containing the date values needed by the query to produced the specified result.
This answer demonstrates the approach to solving the "sparse data" problem.
If the specification is to return just the list of halls, and not the dates they are available, the queries above can be easily modified to remove the date expression from the SELECT list, and add a GROUP BY clause to collapse the rows into a distinct list of halls.

How do I get matching rows Left Outer Join

This time I thought I had it figured out; but how can my addled brain explain this. No, for this I need the experts.
According to Jeff Atwood A Visual Explanation of SQL Joins Left outer join produces a complete set of records from Table A, with the matching records (where available) in Table B.
SELECT R.[Computer]
,L.[User]
,L.MaxDate
,R.[Notes]
,R.[ID]
From (
SELECT [User], max([StartDate]) as MaxDate
FROM <Table1>
Group by [User]
) As L
Left Outer Join <Table1> as R --Self join
on L.MaxDate = R.StartDate
MaxDate on the left always returns only one date for each User. This should be matched by exactly one matching row on the right. Or so I thought. I am getting multiple items for each date and user.
The purpose here is to return all the columns for each user using MaxDate to get the most recent date for each user. As the dates are unique, I should only get one row for each user, but instead I get several.
How do I limit the result set to the single matching row based on on L.MaxDate = R.StartDate ?
You get multiple matches from R if the same StartDate is found for multiple users. Add User to your join condition.

How do I order my query by a field and still group by a subset of that field in db2?

Sorry if the title is confusing. Here is the query I have
Select MONTH(DATE(TIMESTAMP)), SUM(FIELD1), SUM(FIELD2) from TABLE WHERE TIMESTAMP BETWEEN '2009-07-26 00:00:00' AND '2010-02-24 23:59:59' GROUP BY MONTH(DATE(TIMESTAMP))
This will let me get the month number out of the query. The problem is that right now it is sorting the months 1,2,3,4.... when it spans two separate years. I need to be able to sort this query by year then month.
If I add "ORDER BY TIMESTAMP" at the end of my query I get this error:
Column TIMESTAMP or expression in SELECT list not valid. SQLCODE=-122
Also I changed the field names for this question to keep it clear the field isn't actually called TIMESTAMP
You need to group by year then month.:
SELECT YEAR(YourField),
Month(YourField),
SUM(Field1),
SUM(Field2)
FROM Table
WHERE...
GROUP BY
YEAR(YourField),
Month(YourField)
ORDER BY
YEAR(YourField),
Month(YourField)