Sqlalchemy; count items between two dates over a time period - postgresql

My postgres DB has a Price table where I store price data for a bunch of products. For each Price object I store when it was created (Price.timestamp), and whenever there is a new price for the product, I create a new Price object, and for the old Price object I store when it ended (Price.end_time). Both times are datetime objects.
Now, I want to count how many Prices there are at over a time period. Easy I thought, so I did the query below:
trunc_date = db.func.date_trunc('day', Price.timestamp)
query = db.session.query(trunc_date, db.func.count(Price.id))
query = query.order_by(trunc_date.desc())
query = query.group_by(trunc_date)
prices_count = query.all()
Which is great, but only counts how many prices were new/created for each day. So what I thought I could do, was to filter so that I would get prices where the trunc_date is between the beginning and the end for the Price, like below:
query = query.filter(Price.timestamp < trunc_date < Price.time_ended)
But apparently you are not allowed to use trunc_date this way. Can anyone help me with how I am supposed to write my query?
Data example:
Price.id Price.timestamp Price.time_ended
1 2022-18-09 2022-26-09
2 2022-13-09 2022-20-09
The query result i would like to get is:
2022-27-09; 0
2022-26-09; 1
2022-25-09; 1
...
2022-20-09; 2
2022-19-09; 2
2022-18-09; 2
2022-17-09; 1
...
2022-12-09; 0

Have you tried separating the conditions inside the filter?
query = db.session.\
query(trunc_date, db.func.count(Price.id)).\
filter(
(Price.timestamp < trunc_date),
(trunc_date < Price.time_ended)
).\
group_by(trunc_date).\
order_by(trunc_date.desc()).\
all()

you can use
trunc_date.between(Price.timestamp, Price.time_ended)

I figured it out.
First I created a date range by using a subquery.
todays_date = datetime.today() - timedelta(days = 1)
numdays = 360
min_date = todays_date - timedelta(days = numdays)
date_series = db.func.generate_series(min_date , todays_date, timedelta(days=1))
trunc_date = db.func.date_trunc('days', date_series)
subquery = db.session.query(trunc_date.label('day')).subquery()
Then I used the subquery as input in my original query, and I was finally able to filter on the dates from the subquery.
query = db.session.query(subquery.c.day, db.func.count(Price.id))
query = query.order_by(subquery.c.day.desc())
query = query.group_by(subquery.c.day)
query = query.filter(Price.timestamp < subquery.c.day)
query = query.filter(Price.time_ended > subquery.c.day)
Now, query.all() will give you a nice list that counts the prices for each day specified in the date_series.

Related

How to query the first row efficiently?

I have a table with large amount of records:
date instrument price
2019.03.07 X 1.1
2019.03.07 X 1.0
2019.03.07 X 1.2
...
When I query for the day opening price, I use:
1 sublist select from prices where date = 2019.03.07, instrument = `X
It takes a long time to execute because it selects all the prices on that day and get the first one.
I also tried:
select from prices where date = 2019.03.07, instrument = `X, i = 0 //It does not return any record (why?)
select from prices where date = 2019.03.07, instrument = `X, i = first i //Seem to work. Does it?
In Oracle an equivalent will be:
select * from prices where date = to_date(...) and instrument = "X" and rownum = 1
and Oracle will stop immediately when it finds the first record.
How to do this in KDB (e.g. stop immediately after it finds the first record)?
In kdb, where subclauses in select statements are executed sequentially. i.e. only those records which pass the first "test" get passed to the second test. With that in mind, looking at your two attempts:
select from prices where date = 2019.03.07, instrument = `X, i = 0 //It does not return any record (why?)
This doesn't (necessarily) return anything, because by the time it gets to the i=0 check, you've already filtered out some records (possibly including the first record in the original table, which would have i=0)
select from prices where date = 2019.03.07, instrument = `X, i = first i //Seem to work. Does it?
This one should work. First you filter by date. Then within the records for that date, you select the records for instrument `X. Then within those records, you take the record where i is the first i (where i has already been filtered down, so first i is simply the index of the first record [still the index from the original table, not the filtered down version])
Q-SQL equivalent for that is select[n] which also performs better than other approaches in most of the cases. Positive 'n' will give first n records and negative will give last n records.
q) select[1] from prices where date = 2019.03.07, instrument = `X
There is no inbuilt functionality to stop after first match. You can write custom function for that but that would probably execute slower than above supported version.

Filter portal for most recently created record by group

I have a portal on my "Clients" table. The related table contains the results of surveys that are updated over time. For each combination of client and category (a field in the related table), I only want the portal to display the most recently collected row.
Here is a link to a trivial example that illustrates the issue I'm trying to address. I have two tables in this example (Related on ClientID):
Clients
Table 1 Get Summary Method
The Table 1 Get Summary Method table looks like this:
Where:
MaxDate is a summary field = Maximum of Date
MaxDateGroup is a calculated field = GetSummary ( MaxDate ;
ClientIDCategory )
ShowInPortal = If ( Date = MaxDateGroup ; 1 ; 0 )
The table is sorted on ClientIDCategory
Issue 1 that I'm stumped on: .
ShowInPortal should equal 1 in row 3 (PKTable01 = 5), row 4 (PKTable01 = 6), and row 6 (PKTable01 = 4) in the table above. I'm not sure why FM is interpreting 1Red and 1Blue as the same category, or perhaps I'm just misunderstanding what the GetSummary function does.
The Clients table looks like this:
Where:
The portal records are sorted on ClientIDCategory
Issue 2 that I'm stumped on:
I only want rows with a ShowInPortal value equal to 1 should appear in the portal. I tried creating a portal filter with the following formula: Table 1 Get Summary Method::ShowInPortal = 1. However, using that filter removes all row from the portal.
Any help is greatly appreciated.
One solution is to use ExecuteSQL to grab the Max Date. This removes the need for Summary functions and sorts, and works as expected. Propose to return it as number to avoid any issues with date formats.
GetAsTimestamp (
ExecuteSQL (
"SELECT DISTINCT COALESCE(MaxDate,'')
FROM Survey
WHERE ClientIDCategory = ? "
; "" ; "";ClientIDCategory )
)
Also, you need to change the ShowInPortal field to an unstored calc field with:
If ( GetAsNumber(Date) = MaxDateGroupSQL ; 1 ; 0 )
Then filter the portal on this field.
I can send you the sample file if you want.

get data from Moodle Database from code

I have a question on how I can extract data from Moodle based on a parameter thats "greater than" or "less than" a given value.
For instance, I'd like to do something like:
**$record = $DB->get_record_sql('SELECT * FROM {question_attempts} WHERE questionid > ?', array(1));**
How can I achieve this, cause each time that I try this, I get a single record in return, instead of all the rows that meet this certain criteria.
Also, how can I get a query like this to work perfectly?
**$sql = ('SELECT * FROM {question_attempts} qa join {question_attempt_steps} qas on qas.questionattemptid = qa.id');**
In the end, I want to get all the quiz question marks for each user on the system, in each quiz.
Use $DB->get_records_sql() instead of $DB->get_record_sql, if you want more than one record to be returned.
Thanks Davo for the response back then (2016, wow!). I did manage to learn this over time.
Well, here is an example of a proper query for getting results from Moodle DB, using the > or < operators:
$quizid = 100; // just an example param here
$cutoffmark = 40 // anyone above 40% gets a Moodle badge!!
$sql = "SELECT q.name, qg.userid, qg.grade FROM {quiz} q JOIN {quiz_grades} qg ON qg.quiz = q.id WHERE q.id = ? AND qg.grade > ?";
$records = $DB->get_records_sql($sql, [$quizid, $cutoffmark]);
The query will return a record of quiz results with all student IDs and grades, who have a grade of over 40.

Tableau - Calculating average where date is less than value from another data source

I am trying to calculate the average of a column in Tableau, except the problem is I am trying to use a single date value (based on filter) from another data source to only calculate the average where the exam date is <= the filtered date value from the other source.
Note: Parameters will not work for me here, since new date values are being added constantly to the set.
I have tried many different approaches, but the simplest was trying to use a calculated field that pulls in the filtered exam date from the other data source.
It successfully can pull the filtered date, but the formula does not work as expected. 2 versions of the calculation are below:
IF DATE(ATTR([Exam Date])) <= DATE(ATTR([Averages (Tableau Test Scores)].[Updated])) THEN AVG([Raw Score]) END
IF DATEDIFF('day', DATE(ATTR([Exam Date])), DATE(ATTR([Averages (Tableau Test Scores)].[Updated]))) > 1 THEN AVG([Raw Score]) END
Basically, I am looking for the equivalent of this in SQL Server:
SELECT AVG([Raw Score]) WHERE ExamDate <= (Filtered Exam Date)
Below a workbook that shows an example of what I am trying to accomplish. Currently it returns all blanks, likely due to the many-to-one comparison I am trying to use in my calculation.
Any feedback is greatly appreciated!
Tableau Test Exam Workbook
I was able to solve this by using Custom SQL to join the tables together and calculate the average based on my conditions, to get the column results I wanted.
Would still be great to have this ability directly in Tableau, but whatever gets the job done.
Edit:
SELECT
[AcademicYear]
,[Discipline]
--Get the number of student takers
,COUNT([Id]) AS [Students (N)]
--Get the average of the Raw Score
,CAST(AVG(RawScore) AS DECIMAL(10,2)) AS [School Mean]
--Get the number of failures based on an "adjusted score" column
,COUNT([AdjustedScore] < 70 THEN 1 END) AS [School Failures]
--This is the column used as the cutoff point for including scores
,[Average_Update].[Updated]
FROM [dbo].[Average] [Average]
FULL OUTER JOIN [dbo].[Average_Update] [Average_Update] ON ([Average_Update].[Id] = [Average].UpdateDateId)
--The meat of joining data for accurate calculations
FULL OUTER JOIN (
SELECT DISTINCT S.[Id], S.[LastName], S.[FirstName], S.[ExamDate], S.[RawScoreStandard], S.[RawScorePercent], S.[AdjustedScore], S.[Subject], P.[Id] AS PeriodId
FROM [StudentScore] S
FULL OUTER JOIN
(
--Get only the 1st attempt
SELECT DISTINCT [NBOMEId], S2.[Subject], MIN([ExamDate]) AS ExamDate
FROM [StudentScore] S2
GROUP BY [NBOMEId],S2.[Subject]
) B
ON S.[NBOMEId] = B.[NBOMEId] AND S.[Subject] = B.[Subject] AND S.[ExamDate] = B.[ExamDate]
--Group in "Exam Periods" based on the list of periods w/ start & end dates in another table.
FULL OUTER JOIN [ExamPeriod] P
ON S.[ExamDate] = P.PeriodStart AND S.[ExamDate] <= P.PeriodEnd
WHERE S.[Subject] = B.[Subject]
GROUP BY P.[Id], S.[Subject], S.[ExamDate], S.[RawScoreStandard], S.[RawScorePercent], S.[AdjustedScore], S.[NBOMEId], S.[NBOMELastName], S.[NBOMEFirstName], S.[SecondYrTake]) [StudentScore]
ON
([StudentScore].PeriodId = [Average_Update].ExamPeriodId
AND [StudentScore].Subject = [Average].Subject
AND [StudentScore].[ExamDate] <= [Average_Update].[Updated])
--End meat
--Joins to pull in relevant data for normalized tables
FULL OUTER JOIN [dbo].[Student] [Student] ON ([StudentScore].[NBOMEId] = [Student].[NBOMEId])
INNER JOIN [dbo].[ExamPeriod] [ExamPeriod] ON ([Average_Update].ExamPeriodId = [ExamPeriod].[Id])
INNER JOIN [dbo].[AcademicYear] [AcademicYear] ON ([ExamPeriod].[AcademicYearId] = [AcademicYear].[Id])
--This will pull only the latest update entry for every academic year.
WHERE [Updated] IN (
SELECT DISTINCT MAX([Updated]) AS MaxDate
FROM [Average_Update]
GROUP BY[ExamPeriodId])
GROUP BY [AcademicYear].[AcademicYearText], [Average].[Subject], [Average_Update].[Updated],
ORDER BY [AcademicYear].[AcademicYearText], [Average_Update].[Updated], [Average].[Subject]
I couldn't download your file to test with your data, but try reversing the order of taking the average ie
average(IF DATE(ATTR([Exam Date])) <= DATE(ATTR([Averages (Tableau Test Scores)].[Updated]) then [Raw Score]) END)
as written, I believe you'll be averaging the data before returning it from the if statement, whereas you want to return the data, then average it.

MySQL select rows where date not between date

I have a booking system in which I need to select any available room from the database. The basic setup is:
table: room
columns: id, maxGuests
table: roombooking
columns: id, startDate, endDate
table: roombooking_room:
columns: id, room_id, roombooking_id
I need to select rooms that can fit the requested guests in, or select two (or more) rooms to fit the guests in (as defined by maxGuests, obviously using the lowest/closet maxGuests first)
I could loop through my date range and use this sql:
SELECT `id`
FROM `room`
WHERE `id` NOT IN
(
SELECT `roombooking_room`.`room_id`
FROM `roombooking_room`, `roombooking`
WHERE `roombooking`.`confirmed` =1
AND DATE(%s) BETWEEN `roombooking`.`startDate` AND `roombooking`.`endDate`
)
AND `room`.`maxGuests`>=%d
Where %$1 is the looped date and %2d is the number of guests to be booked in. But this will just return false if there are more guests than any room can take, and there must be a quicker way of doing this rather than looping with php and running the query?
This is similar to part of the sql I was thinking of: Getting Dates between a range of dates but with Mysql
Solution, based on ircmaxwell's answer:
$query = sprintf(
"SELECT `id`, `maxGuests`
FROM `room`
WHERE `id` NOT IN
(
SELECT `roombooking_room`.`room_id`
FROM `roombooking_room`
JOIN `roombooking` ON `roombooking_room`.`roombooking_id` = `roombooking`.`id`
WHERE `roombooking`.`confirmed` =1
AND (`roomBooking`.`startDate` > DATE(%s) OR `roomBooking`.`endDate` < DATE(%s))
)
AND `maxGuests` <= %d ORDER BY `maxGuests` DESC",
$endDate->toString('yyyy-MM-dd'), $startDate->toString('yyyy-MM-dd'), $noGuests);
$result = $db->query($query);
$result = $result->fetchAll();
$rooms = array();
$guests = 0;
foreach($result as $res) {
if($guests >= $noGuests) break;
$guests += (int)$res['maxGuests'];
$rooms[] = $res['id'];
}
Assuming that you are interested to place #Guests from #StartDate to #EndDate
SELECT DISTINCT r.id,
FROM room r
LEFT JOIN roombooking_room rbr ON r.id = rbr.room_id
LEFT JOIN roombooking ON rbr.roombooking_id = rb.id
WHERE COALESCE(#StartDate NOT BETWEEN rb.startDate AND rb.endDate, TRUE)
AND COALESCE(#EndDate NOT BETWEEN rb.startDate AND rb.endDate, TRUE)
AND #Guests < r.maxGuests
should give you a list of all rooms that are free and can accommodate given number of guests for the given period.
NOTES
This query works only for single rooms, if you want to look at multiple rooms you will need to apply the same criteria to a combination of rooms. For this you would need recursive queries or some helper tables.
Also, COALESCE is there to take care of NULLs - if a room is not booked at all it would not have any records with dates to compare to, so it would not return completely free rooms. Date between date1 and date2 will return NULL if either date1 or date2 is null and coalesce will turn it to true (alternative is to do a UNION of completely free rooms; which might be faster).
With multiple rooms things get really interesting.
Is that scenario big part of your problem? And which database are you using i.e. do you have access to recursive queries?
EDIT
As I stated multiple times before, your way of looking for a solution (greedy algorithm that looks at the largest free rooms first) is not the optimal if you want to get the best fit between required number of guests and rooms.
So, if you replace your foreach with
$bestCapacity = 0;
$bestSolution = array();
for ($i = 1; $i <= pow(2,sizeof($result))-1; $i++) {
$solutionIdx = $i;
$solutionGuests = 0;
$solution = array();
$j = 0;
while ($solutionIdx > 0) :
if ($solutionIdx % 2 == 1) {
$solution[] = $result[$j]['id'];
$solutionGuests += $result[$j]['maxGuests'];
}
$solutionIdx = intval($solutionIdx/2);
$j++;
endwhile;
if (($solutionGuests <= $bestCapacity || $bestCapacity == 0) && $solutionGuests >= $noGuests) {
$bestCapacity = $solutionGuests;
$bestSolution = $solution;
}
}
print_r($bestSolution);
print_r($bestCapacity);
Will go through all possible combinations and find the solution that wastes the least number of spaces.
Ok, first off, the inner query you're using is a cartesian join, and will be VERY expensive. You need to specify join criteria (roombooking_room.booking_id = roombooking.id for example).
Secondly, assuming that you have a range of dates, what can we say about that? Well, let's call the start of your range rangeStartDate and rangeEndDate.
Now, what can we say about any other range of dates that does not have any form of overlap with this range? Well, the endDate must not be between be either the rangeStartDate and the rangeEndDate. Same with the startDate. And the rangeStartDate (and rangeEndDate, but we don't need to check it) cannot be between startDate and endDate...
So, assuming %1$s is rangeStartDate and %2$s is rangeEndDate, a comprehensive where clause might be:
WHERE `roomBooking`.`startDate` NOT BETWEEN %1$s AND %2s
AND `roomBooking`.`endDate` NOT BETWEEN %1$s AND %2$$s
AND %1s NOT BETWEEN `roomBooking`.`startDate` AND `roomBooking`.`endDate`
But, there's a simpler way of saying that. The only way for a range to be outside of another is for the start_date to be after the end_date, or the end_date be before the start_id
So, assuming %1$s is rangeStartDate and %2$s is rangeEndDate, another comprehensive where clause might be:
WHERE `roomBooking`.`startDate` > %2$s
OR `roomBooking`.`endDate` < %1$s
So, that brings your overall query to:
SELECT `id`
FROM `room`
WHERE `id` NOT IN
(
SELECT `roombooking_room`.`room_id`
FROM `roombooking_room`
JOIN `roombooking` ON `roombooking_room`.`roombooking_id` = `roombooking`.`id`
WHERE `roombooking`.`confirmed` =1
AND (`roomBooking`.`startDate` > %2$s
OR `roomBooking`.`endDate` < %1$s)
)
AND `room`.`maxGuests`>=%d
There are other ways of doing this as well, so keep looking...
SELECT rooms.id
FROM rooms LEFT JOIN bookings
ON booking.room_id = rooms.id
WHERE <booking overlaps date range of interest> AND <wherever else>
GROUP BY rooms.id
HAVING booking.id IS NULL
I might be miss remembering how left join works so you might need to use a slightly different condition on the having, maybe a count or a sum.
At worst, with suitable indexes, that should scan half the bookings.