I have a booking system in which I need to select any available room from the database. The basic setup is:
table: room
columns: id, maxGuests
table: roombooking
columns: id, startDate, endDate
table: roombooking_room:
columns: id, room_id, roombooking_id
I need to select rooms that can fit the requested guests in, or select two (or more) rooms to fit the guests in (as defined by maxGuests, obviously using the lowest/closet maxGuests first)
I could loop through my date range and use this sql:
SELECT `id`
FROM `room`
WHERE `id` NOT IN
(
SELECT `roombooking_room`.`room_id`
FROM `roombooking_room`, `roombooking`
WHERE `roombooking`.`confirmed` =1
AND DATE(%s) BETWEEN `roombooking`.`startDate` AND `roombooking`.`endDate`
)
AND `room`.`maxGuests`>=%d
Where %$1 is the looped date and %2d is the number of guests to be booked in. But this will just return false if there are more guests than any room can take, and there must be a quicker way of doing this rather than looping with php and running the query?
This is similar to part of the sql I was thinking of: Getting Dates between a range of dates but with Mysql
Solution, based on ircmaxwell's answer:
$query = sprintf(
"SELECT `id`, `maxGuests`
FROM `room`
WHERE `id` NOT IN
(
SELECT `roombooking_room`.`room_id`
FROM `roombooking_room`
JOIN `roombooking` ON `roombooking_room`.`roombooking_id` = `roombooking`.`id`
WHERE `roombooking`.`confirmed` =1
AND (`roomBooking`.`startDate` > DATE(%s) OR `roomBooking`.`endDate` < DATE(%s))
)
AND `maxGuests` <= %d ORDER BY `maxGuests` DESC",
$endDate->toString('yyyy-MM-dd'), $startDate->toString('yyyy-MM-dd'), $noGuests);
$result = $db->query($query);
$result = $result->fetchAll();
$rooms = array();
$guests = 0;
foreach($result as $res) {
if($guests >= $noGuests) break;
$guests += (int)$res['maxGuests'];
$rooms[] = $res['id'];
}
Assuming that you are interested to place #Guests from #StartDate to #EndDate
SELECT DISTINCT r.id,
FROM room r
LEFT JOIN roombooking_room rbr ON r.id = rbr.room_id
LEFT JOIN roombooking ON rbr.roombooking_id = rb.id
WHERE COALESCE(#StartDate NOT BETWEEN rb.startDate AND rb.endDate, TRUE)
AND COALESCE(#EndDate NOT BETWEEN rb.startDate AND rb.endDate, TRUE)
AND #Guests < r.maxGuests
should give you a list of all rooms that are free and can accommodate given number of guests for the given period.
NOTES
This query works only for single rooms, if you want to look at multiple rooms you will need to apply the same criteria to a combination of rooms. For this you would need recursive queries or some helper tables.
Also, COALESCE is there to take care of NULLs - if a room is not booked at all it would not have any records with dates to compare to, so it would not return completely free rooms. Date between date1 and date2 will return NULL if either date1 or date2 is null and coalesce will turn it to true (alternative is to do a UNION of completely free rooms; which might be faster).
With multiple rooms things get really interesting.
Is that scenario big part of your problem? And which database are you using i.e. do you have access to recursive queries?
EDIT
As I stated multiple times before, your way of looking for a solution (greedy algorithm that looks at the largest free rooms first) is not the optimal if you want to get the best fit between required number of guests and rooms.
So, if you replace your foreach with
$bestCapacity = 0;
$bestSolution = array();
for ($i = 1; $i <= pow(2,sizeof($result))-1; $i++) {
$solutionIdx = $i;
$solutionGuests = 0;
$solution = array();
$j = 0;
while ($solutionIdx > 0) :
if ($solutionIdx % 2 == 1) {
$solution[] = $result[$j]['id'];
$solutionGuests += $result[$j]['maxGuests'];
}
$solutionIdx = intval($solutionIdx/2);
$j++;
endwhile;
if (($solutionGuests <= $bestCapacity || $bestCapacity == 0) && $solutionGuests >= $noGuests) {
$bestCapacity = $solutionGuests;
$bestSolution = $solution;
}
}
print_r($bestSolution);
print_r($bestCapacity);
Will go through all possible combinations and find the solution that wastes the least number of spaces.
Ok, first off, the inner query you're using is a cartesian join, and will be VERY expensive. You need to specify join criteria (roombooking_room.booking_id = roombooking.id for example).
Secondly, assuming that you have a range of dates, what can we say about that? Well, let's call the start of your range rangeStartDate and rangeEndDate.
Now, what can we say about any other range of dates that does not have any form of overlap with this range? Well, the endDate must not be between be either the rangeStartDate and the rangeEndDate. Same with the startDate. And the rangeStartDate (and rangeEndDate, but we don't need to check it) cannot be between startDate and endDate...
So, assuming %1$s is rangeStartDate and %2$s is rangeEndDate, a comprehensive where clause might be:
WHERE `roomBooking`.`startDate` NOT BETWEEN %1$s AND %2s
AND `roomBooking`.`endDate` NOT BETWEEN %1$s AND %2$$s
AND %1s NOT BETWEEN `roomBooking`.`startDate` AND `roomBooking`.`endDate`
But, there's a simpler way of saying that. The only way for a range to be outside of another is for the start_date to be after the end_date, or the end_date be before the start_id
So, assuming %1$s is rangeStartDate and %2$s is rangeEndDate, another comprehensive where clause might be:
WHERE `roomBooking`.`startDate` > %2$s
OR `roomBooking`.`endDate` < %1$s
So, that brings your overall query to:
SELECT `id`
FROM `room`
WHERE `id` NOT IN
(
SELECT `roombooking_room`.`room_id`
FROM `roombooking_room`
JOIN `roombooking` ON `roombooking_room`.`roombooking_id` = `roombooking`.`id`
WHERE `roombooking`.`confirmed` =1
AND (`roomBooking`.`startDate` > %2$s
OR `roomBooking`.`endDate` < %1$s)
)
AND `room`.`maxGuests`>=%d
There are other ways of doing this as well, so keep looking...
SELECT rooms.id
FROM rooms LEFT JOIN bookings
ON booking.room_id = rooms.id
WHERE <booking overlaps date range of interest> AND <wherever else>
GROUP BY rooms.id
HAVING booking.id IS NULL
I might be miss remembering how left join works so you might need to use a slightly different condition on the having, maybe a count or a sum.
At worst, with suitable indexes, that should scan half the bookings.
Related
I need to write a query that uses EXISTS, rather than IN, so that it will run fast. The filter is being fed so many parameter values that EXISTS seems like the only option. The difference is between a 20+ minute query and a 5 second query.
This is the query I have:
SELECT DISTINCT d.GROUP_NAME
FROM [EMPLOYEE] e JOIN [DATA_FACT] d ON (e.KEY = d.KEY)
WHERE d.DATE BETWEEN #Start and #End
AND EXISTS
(
select '1234567' -- #ID
)
AND e.Location IN (#Location)
ORDER BY d.GROUP_NAME ASC
The problem is that it is returning too many records. Based on the values I'm passing to filter on, I should get 1 row back but instead I am getting 28.
If I remove the EXISTS and add the following then I get the 1 record I need:
AND e.ID IN ('1234567')
Is there a way to fix the query to work with EXISTS so that I get the correct results?
This is essentially what you want if you are going to try to use exists to filter your data_fact table by parameters in your employee table. Not sure how much it's going to improve your performance though when you throw a massive number of employee IDs at it.
SELECT
d.GROUP_NAME
FROM [DATA_FACT] AS d
WHERE d.DATE BETWEEN #Start and #End
AND EXISTS
(
select 1
from EMPLOYEE AS e
WHERE d.[KEY] = e.[KEY]
AND e.[Location] IN (#Location)
AND e.ID IN ('1234567')
)
ORDER BY d.GROUP_NAME ASC
I want to compare two results of queries of the same table, by checking theresulting row count, but Postgres doesn't support column aliases in the where clause.
select id from article where version=1308
and exists(
select count(ident) as count1 from artprice AS p1
where p1.valid_to<to_timestamp(1586642400000) or p1.valid_from>to_timestamp(1672441199000)
and p1.article=article.id
and p1.count1=(select count(ident) from artprice where article=article.id)
)
I also cannot use aggregate functions in the where clause, so
select id from article where version=1308
and exists(
select count(ident) as count1 from artprice AS p1
where p1.valid_to<to_timestamp(1586642400000) or p1.valid_from>to_timestamp(1672441199000)
and p1.article=article.id
and p1.count(ident)=(select count(ident) from artprice where article=article.id)
)
also doesn't work. Any ideas?
EDIT:
What I want to get are articles where every article price is outside of a valid range defined by validFrom andValidTo.
I now changed the statement by negating the positive conditions:
Select distinct article.id from Article article, ArtPrice price
where
(
(article.version=?)
and
(
(
(
(
(not(price.valid_from>=?)) or (not(price.valid_to<=?))
)
and
(
(not(price.valid_from<=?)) or (not(price.valid_to>=?))
)
)
and
(
(not(price.valid_to>=?)) or (not(price.valid_to<=?))
)
)
and
(
(not(price.valid_from>=?)) or (not(price.valid_from<=?))
)
)
) and article.id=price.article
Probably not the very elegant solution, but it works.
Aggregates are not allowed in WHERE clause, but there's HAVING clause for them.
EDIT: What I want to get are articles where every article price is outside of a valid range defined by validFrom andValidTo.
I think that bool_or() would be a good fit here when combined with range operations:
SELECT article.id
FROM Article AS article
JOIN ArtPrice AS price ON price.article = article.id
WHERE article.version = 1308
GROUP BY article.id
HAVING NOT bool_or(tsrange(price.valid_from, price.valid_to)
&& tsrange(to_timestamp(1586642400000),
to_timestamp(1672441199000)))
This reads as "...those having not any price tsrange overlap with given tsrange".
Postgresql also supports the SQL OVERLAPS operator:
(price.valid_from, price.valid_to) OVERLAPS (to_timestamp(1586642400000),
to_timestamp(1672441199000))
As a note, it operates on half-open intervals start <= time < end.
I'm wondering if anybody can help me out with any or all of this code below. I've made it work, but it seems inefficient to me and is probably quite a bit slower than optimal.
Some basic background on the necessity of this code in the first place:
I have a table of shipping records that does not include the corresponding invoice number. I've looked all through the tables and I continue to do so. In fact, only this morning I discovered that if a packing slip has been generated that I can link the shipping table to the packing slip table via that packing slip ID and grab the invoice number from there. Absent that link, however, I'm forced to guess. In most instances, that's not terribly difficult, because the invoice table has number, line and release that can match up. But when there are multiple shipments for number, line and release (for instance, when a line is partially shipped) then there can be multiple answers, only one of which is correct. I am partially helped by the presence of a a column in the shipping table that states what the date sequence is for that number, line and release, but there are still circumstances where the process I use for "guessing" can be somewhat ambiguous.
What my procedure does is this. First, it creates a table of data that includes the invoice number if there was a pack slip to link it through.
Next, it dumps all of that data into a second table, this time using--only if the invoice was NULL in the first table--a "guess" about the invoice number based on partitioning all the shipping records by number, line, release, date sequence and date, and then comparing that to the same type of thing for the invoice table, and trying to line everything up by date.
Finally, it parses through that table and finds any last nulls and essentially matches them up with the first record of any invoice for that number, line and release.
Both guesses have added characters to show that they are, in fact, guesses.
IF OBJECT_ID('tempdb..#cosTAble') IS NOT NULL
DROP TABLE #cosTable
DECLARE #cosTable2 TABLE (
ID INT IDENTITY
,co_num CoNumType
,co_line CoLineType
,co_release CoReleaseType
,date_seq DateSeqType
,ship_date DateType
,inv_num NVARCHAR(14)
)
DECLARE
#co_num_ck CoNumType
,#co_line_ck CoLineType
,#co_release_ck CoReleaseType
DECLARE #Counter1 INT = 0
SELECT cos.co_num, cos.co_line, cos.co_release, cos.date_seq, cos.ship_date, cos.qty_invoiced, pck.inv_num
INTO #cosTable
FROM co_ship cos
LEFT JOIN pckitem pck
ON cos.pack_num = pck.pack_num
AND cos.co_num = pck.co_num
AND cos.co_line = pck.co_line
AND cos.co_release = pck.co_release
;WITH cos_Order
AS(
SELECT co_num, co_line, co_release, qty_invoiced, date_seq, ship_date, ROW_NUMBER () OVER (PARTITION BY co_num, co_line, co_release ORDER BY ship_date) AS cosrow
FROM co_ship
WHERE qty_invoiced > 0
),
invi_Order
AS(
SELECT inv_num, co_num, co_line, co_release, ROW_NUMBER () OVER (PARTITION BY co_num, co_line, co_release ORDER BY RecordDate) AS invirow
FROM inv_item
WHERE qty_invoiced > 0
),
cos_invi
AS(
SELECT cosO.*, inviO.inv_num
FROM cos_Order cosO
LEFT JOIN invi_Order inviO
ON cosO.co_num = inviO.co_num AND cosO.co_line = inviO.co_line AND cosO.cosrow = inviO.invirow)
INSERT INTO #cosTable2
SELECT cosT.co_num, cosT.co_line, cosT.co_release, cosT.date_seq, cosT.ship_date, COALESCE(cosT.inv_num,'*'+cosi.inv_num) AS inv_num
FROM #cosTable cosT
LEFT JOIN cos_invi cosi
ON cosT.co_num = cosi.co_num
AND cosT.co_line = cosi.co_line
AND cosT.co_release = cosi.co_release
AND cosT.date_seq = cosi.date_seq
AND cosT.ship_date = cosi.ship_date
WHILE #Counter1 < (SELECT MAX(ID) FROM #cosTable2) BEGIN
SET #Counter1 += 1
SET #co_num_ck = (SELECT co_num FROM #cosTable2 WHERE ID = #Counter1)
SET #co_line_ck = (SELECT co_line FROM #cosTable2 WHERE ID = #Counter1)
SET #co_release_ck = (SELECT co_release FROM #cosTable2 WHERE ID = #Counter1)
IF EXISTS (SELECT * FROM #cosTable2 WHERE ID = #Counter1 AND inv_num IS NULL)
UPDATE #cosTable2
SET inv_num = '^' + (SELECT TOP 1 inv_num FROM #cosTable2 WHERE
#co_num_ck = co_num AND
#co_line_ck = co_line AND
#co_release_ck = co_release)
WHERE ID = #Counter1 AND inv_num IS NULL
END
SELECT * FROM #cosTable2
ORDER BY co_num, co_line, co_release, date_seq, ship_date
You're in a bad spot - as #craig.white and #HLGEM suggest, you've inherited something without sufficient constraints to make the data correct or safe...and now you have to "synthesize" it. I get that guesses are the best you can do, and you can, at least make your guesses reasonable performance-wise.
After that, you should squeal loudly to get some time to fix the db - to apply the constraints needed to prevent further crapification of the data.
Performance-wise, the while loop is a disaster. You'd be better off replacing that whole mess with a single update statement...something like:
update c0
set inv_nbr = '^' + c1.inv_nbr
from
#cosTable2 c0
left outer join
(
select
co_num,
co_line,
co_release,
inv_nbr
from
#cosTable2
where
inv_nbr is not null
group by
co_num,
co_line,
co_release,
inv_nbr
) as c1
on
c0.co_num = c1.co_num and
c0.co_line = c1.co_line and
c0.co_release = c1.co_release
where
c0.inv_num is null
...which does the same thing the loop does, only in a single statement.
It seems to me that you are trying very hard to solve a problem that should not exist. What you describe is an unfortunately common situation where a process has grown organically without intent and specific direction as a business has grown which has made data extraction near impossible to automate. You very much need a set of policies and procedures- For (very crude and simple) example:
1: An Order must exist before a packing slip can be generated.
2: a packing slip must exist before an invoice can be generated.
3: an invoice is created using data from the packing slip and order (what was requested, what was picked, what do we bill)
-Again, this is a crude example just to illustrate the idea.
All of the data MUST be entered at the proper time or someone has not done their job.
It is not in the IT departments typical skillset to accurately and consistently provide management good data when such data does not exist.
I am trying to calculate the average of a column in Tableau, except the problem is I am trying to use a single date value (based on filter) from another data source to only calculate the average where the exam date is <= the filtered date value from the other source.
Note: Parameters will not work for me here, since new date values are being added constantly to the set.
I have tried many different approaches, but the simplest was trying to use a calculated field that pulls in the filtered exam date from the other data source.
It successfully can pull the filtered date, but the formula does not work as expected. 2 versions of the calculation are below:
IF DATE(ATTR([Exam Date])) <= DATE(ATTR([Averages (Tableau Test Scores)].[Updated])) THEN AVG([Raw Score]) END
IF DATEDIFF('day', DATE(ATTR([Exam Date])), DATE(ATTR([Averages (Tableau Test Scores)].[Updated]))) > 1 THEN AVG([Raw Score]) END
Basically, I am looking for the equivalent of this in SQL Server:
SELECT AVG([Raw Score]) WHERE ExamDate <= (Filtered Exam Date)
Below a workbook that shows an example of what I am trying to accomplish. Currently it returns all blanks, likely due to the many-to-one comparison I am trying to use in my calculation.
Any feedback is greatly appreciated!
Tableau Test Exam Workbook
I was able to solve this by using Custom SQL to join the tables together and calculate the average based on my conditions, to get the column results I wanted.
Would still be great to have this ability directly in Tableau, but whatever gets the job done.
Edit:
SELECT
[AcademicYear]
,[Discipline]
--Get the number of student takers
,COUNT([Id]) AS [Students (N)]
--Get the average of the Raw Score
,CAST(AVG(RawScore) AS DECIMAL(10,2)) AS [School Mean]
--Get the number of failures based on an "adjusted score" column
,COUNT([AdjustedScore] < 70 THEN 1 END) AS [School Failures]
--This is the column used as the cutoff point for including scores
,[Average_Update].[Updated]
FROM [dbo].[Average] [Average]
FULL OUTER JOIN [dbo].[Average_Update] [Average_Update] ON ([Average_Update].[Id] = [Average].UpdateDateId)
--The meat of joining data for accurate calculations
FULL OUTER JOIN (
SELECT DISTINCT S.[Id], S.[LastName], S.[FirstName], S.[ExamDate], S.[RawScoreStandard], S.[RawScorePercent], S.[AdjustedScore], S.[Subject], P.[Id] AS PeriodId
FROM [StudentScore] S
FULL OUTER JOIN
(
--Get only the 1st attempt
SELECT DISTINCT [NBOMEId], S2.[Subject], MIN([ExamDate]) AS ExamDate
FROM [StudentScore] S2
GROUP BY [NBOMEId],S2.[Subject]
) B
ON S.[NBOMEId] = B.[NBOMEId] AND S.[Subject] = B.[Subject] AND S.[ExamDate] = B.[ExamDate]
--Group in "Exam Periods" based on the list of periods w/ start & end dates in another table.
FULL OUTER JOIN [ExamPeriod] P
ON S.[ExamDate] = P.PeriodStart AND S.[ExamDate] <= P.PeriodEnd
WHERE S.[Subject] = B.[Subject]
GROUP BY P.[Id], S.[Subject], S.[ExamDate], S.[RawScoreStandard], S.[RawScorePercent], S.[AdjustedScore], S.[NBOMEId], S.[NBOMELastName], S.[NBOMEFirstName], S.[SecondYrTake]) [StudentScore]
ON
([StudentScore].PeriodId = [Average_Update].ExamPeriodId
AND [StudentScore].Subject = [Average].Subject
AND [StudentScore].[ExamDate] <= [Average_Update].[Updated])
--End meat
--Joins to pull in relevant data for normalized tables
FULL OUTER JOIN [dbo].[Student] [Student] ON ([StudentScore].[NBOMEId] = [Student].[NBOMEId])
INNER JOIN [dbo].[ExamPeriod] [ExamPeriod] ON ([Average_Update].ExamPeriodId = [ExamPeriod].[Id])
INNER JOIN [dbo].[AcademicYear] [AcademicYear] ON ([ExamPeriod].[AcademicYearId] = [AcademicYear].[Id])
--This will pull only the latest update entry for every academic year.
WHERE [Updated] IN (
SELECT DISTINCT MAX([Updated]) AS MaxDate
FROM [Average_Update]
GROUP BY[ExamPeriodId])
GROUP BY [AcademicYear].[AcademicYearText], [Average].[Subject], [Average_Update].[Updated],
ORDER BY [AcademicYear].[AcademicYearText], [Average_Update].[Updated], [Average].[Subject]
I couldn't download your file to test with your data, but try reversing the order of taking the average ie
average(IF DATE(ATTR([Exam Date])) <= DATE(ATTR([Averages (Tableau Test Scores)].[Updated]) then [Raw Score]) END)
as written, I believe you'll be averaging the data before returning it from the if statement, whereas you want to return the data, then average it.
I have a large database, that I want to do some logic to update new fields.
The primary key is id for the table harvard_assignees
The LOGIC GOES LIKE THIS
Select all of the records based on id
For each record (WHILE), if (state is NOT NULL && country is NULL), update country_out = "US" ELSE update country_out=country
I see step 1 as a PostgreSQL query and step 2 as a function. Just trying to figure out the easiest way to implement natively with the exact syntax.
====
The second function is a little more interesting, requiring (I believe) DISTINCT:
Find all DISTINCT foreign_keys (a bivariate key of pat_type,patent)
Count Records that contain that value (e.g., n=3 records have fkey "D","388585")
Update those 3 records to identify percent as 1/n (e.g., UPDATE 3 records, set percent = 1/3)
For the first one:
UPDATE
harvard_assignees
SET
country_out = (CASE
WHEN (state is NOT NULL AND country is NULL) THEN 'US'
ELSE country
END);
At first it had condition "id = ..." but I removed that because I believe you actually want to update all records.
And for the second one:
UPDATE
example_table
SET
percent = (SELECT 1/cnt FROM (SELECT count(*) AS cnt FROM example_table AS x WHERE x.fn_key_1 = example_table.fn_key_1 AND x.fn_key_2 = example_table.fn_key_2) AS tmp WHERE cnt > 0)
That one will be kind of slow though.
I'm thinking on a solution based on window functions, you may want to explore those too.