Returning max version number per interval period timestamp - entity-framework

I have a dataset with the following fields:
IntervalPeriodTimestamp
RegisterTypeCode
ReadingReplacementVersion
IntervalValue
I receive readings every 30 minutes but sometimes these readings are replaced and the readingreplacementversionumber is incremented.
Using LINQ how can I return only a list of the largest reading replacementversion for each half hourly period?
I have tried:
db.IntervalInformations
.Where(ii => ii.ChannelInformation.RegisterTypeCode == "62")
.OrderBy(ii => ii.IntervalPeriodTimestamp)
.ThenBy(ii => ii.ChannelInformation.Meter.messageType342.ReadingReplacementVersionNumber)
.ToList();
This returns a list sorted by Interval Period Timestamp and then by Replacement Version but I only want to return the list of interval informations with the MAX one. I have been able to achieve this in SQL using code:
WITH CTE AS
(
select SerialNumber, RegisterTypeCode, IntervalStatusCode, UOM_Code,
IntervalPeriodTimestamp, ReadingReplacementVersionNumber, IntervalValue,
RN = ROW_NUMBER()
OVER (PARTITION BY IntervalPeriodTimestamp
ORDER BY ReadingReplacementVersionNumber desc)
from MarketMessage as a
inner join messageType342 as b on a.MarketMessageID = b.MarketMessageID
inner join Meter as c on b.messageType342ID = c.MessageType342ID
inner join ChannelInformation as d on c.MeterID = d.MeterID
inner join IntervalInformation as e on d.ChannelInformationID = e.ChannelInformationID
where MPRN = 101010101010
and CAST(IntervalPeriodTimestamp as date) between '1 feb 2018' and '28 feb 2018'
and RegisterTypeCode = 62
)
select * from CTE
WHERE RN = 1
Not sure how I can convert this to LINQ statement

Try something like this: group your items into the different groups from which you want only one record, order the records by ReadingReplacementVersionNumber in descendent order, and get the first record in each group. Easier to understand with some code:
db.IntervalInformations
.Where(ii => ii.ChannelInformation.RegisterTypeCode == "62")
.OrderByDescending(ii => ii.ReadingReplacementVersionNumber )
.GroupBy(x => new { x.SerialNumber, x.RegisterTypeCode, x.IntervalStatusCode })
.Select(g => g.First());
The list of properties in GroupBy in your real query may be different. You have to define correctly the criteria for composing the groups. Include in the GroupBy list the combination of properties that will have the same value in each one of your groups, and different values in other groups.

Related

Return closest timestamp from Table B based on timestamp from Table A with matching Product IDs

Goal: Create a query to pull the closest cycle count event (Table C) for a product ID based on the inventory adjustments results sourced from another table (Table A).
All records from Table A will be used, but is not guaranteed to have a match in Table C.
The ID column will be present in both tables, but is not unique in either, so that pair of IDs and Timestamps together are needed for each table.
Current simplified SQL
SELECT
A.WHENOCCURRED,
A.LPID,
A.ITEM,
A.ADJQTY,
C.WHENOCCURRED,
C.LPID,
C.LOCATION,
C.ITEM,
C.QUANTITY,
C.ENTQUANTITY
FROM
A
LEFT JOIN
C
ON A.LPID = C.LPID
WHERE
A.facility = 'FACID'
AND A.WHENOCCURRED > '23-DEC-22'
AND A.ADJREASONABBREV = 'CYCLE COUNTS'
ORDER BY A.WHENOCCURRED DESC
;
This is currently pulling the first hit on C.WHENOCCURRED on the LPID matches. Want to see if there is a simpler JOIN solution before going in a direction that creates 2 temp tables based on WHENOCCURRED.
I have a functioning INDEX(MATCH(MIN()) solution in Excel but that requires exporting a couple system reports first and is extremely slow with X,XXX row tables.
If you are using Oracle 12 or later, you can use a LATERAL join and FETCH FIRST ROW ONLY:
SELECT A.WHENOCCURRED,
A.LPID,
A.ITEM,
A.ADJQTY,
C.WHENOCCURRED,
C.LPID,
C.LOCATION,
C.ITEM,
C.QUANTITY,
C.ENTQUANTITY
FROM A
LEFT OUTER JOIN LATERAL (
SELECT *
FROM C
WHERE A.LPID = C.LPID
AND A.whenoccurred <= c.whenoccurred
ORDER BY c.whenoccurred
FETCH FIRST ROW ONLY
) C
ON (1 = 1) -- The join condition is inside the lateral join
WHERE A.facility = 'FACID'
AND A.WHENOCCURRED > DATE '2022-12-23'
AND A.ADJREASONABBREV = 'CYCLE COUNTS'
ORDER BY A.WHENOCCURRED DESC;

Selecting max value grouped by specific column

Focused DB tables:
Task:
For given location ID and culture ID, get max(crop_yield.value) * culture_price.price (let's call this multiplication monetaryGain) grouped by year, so something like:
[
{
"year":2014,
"monetaryGain":...
},
{
"year":2015,
"monetaryGain":...
},
{
"year":2016,
"monetaryGain":...
},
...
]
Attempt:
SELECT cp.price * max(cy.value) AS monetaryGain, EXTRACT(YEAR FROM cy.date) AS year
FROM culture_price AS cp
JOIN culture AS c ON cp.id_culture = c.id
JOIN crop_yield AS cy ON cy.id_culture = c.id
WHERE c.id = :cultureId AND cy.id_location = :locationId AND cp.year = year
GROUP BY year
ORDER BY year
The problem:
"columns "cp.price", "cy.value" and "cy.date" must appear in the GROUP BY clause or be used in an aggregate function"
If I put these three columns in GROUP BY, I won't get expected result - It won't be grouped just by year obviously.
Does anyone have an idea on how to fix/write this query better in order to get task result?
Thanks in advance!
The fix
Rewrite monetaryGain to be:
max(cp.price * cy.value) AS monetaryGain
That way you will not be required to group by cp.price because it is not outputted as an group member, but used in aggregate.
Why?
When you write GROUP BY query you can output only columns that are in GROUP BY list and aggregate function values. Well this is expected - you expect single row per group, but you may have several distinct values for the field that is not in grouping column list.
For the same reason you can not use a non grouping column(-s) in arithmetic or any other (not aggregate) function because this would lead in several results for in single row - there would not be a way to display.
This is VERY loose explanation but I hope will help to grasp the concept.
Aliases in GROUP BY
Also you should not use aliases in GROUP BY. Use:
GROUP BY EXTRACT(YEAR FROM cy.date)
Using alias in GROUP BY is not allowed. This link might explain why: https://www.postgresql.org/message-id/7608.1259177709%40sss.pgh.pa.us

Tableau - Calculating average where date is less than value from another data source

I am trying to calculate the average of a column in Tableau, except the problem is I am trying to use a single date value (based on filter) from another data source to only calculate the average where the exam date is <= the filtered date value from the other source.
Note: Parameters will not work for me here, since new date values are being added constantly to the set.
I have tried many different approaches, but the simplest was trying to use a calculated field that pulls in the filtered exam date from the other data source.
It successfully can pull the filtered date, but the formula does not work as expected. 2 versions of the calculation are below:
IF DATE(ATTR([Exam Date])) <= DATE(ATTR([Averages (Tableau Test Scores)].[Updated])) THEN AVG([Raw Score]) END
IF DATEDIFF('day', DATE(ATTR([Exam Date])), DATE(ATTR([Averages (Tableau Test Scores)].[Updated]))) > 1 THEN AVG([Raw Score]) END
Basically, I am looking for the equivalent of this in SQL Server:
SELECT AVG([Raw Score]) WHERE ExamDate <= (Filtered Exam Date)
Below a workbook that shows an example of what I am trying to accomplish. Currently it returns all blanks, likely due to the many-to-one comparison I am trying to use in my calculation.
Any feedback is greatly appreciated!
Tableau Test Exam Workbook
I was able to solve this by using Custom SQL to join the tables together and calculate the average based on my conditions, to get the column results I wanted.
Would still be great to have this ability directly in Tableau, but whatever gets the job done.
Edit:
SELECT
[AcademicYear]
,[Discipline]
--Get the number of student takers
,COUNT([Id]) AS [Students (N)]
--Get the average of the Raw Score
,CAST(AVG(RawScore) AS DECIMAL(10,2)) AS [School Mean]
--Get the number of failures based on an "adjusted score" column
,COUNT([AdjustedScore] < 70 THEN 1 END) AS [School Failures]
--This is the column used as the cutoff point for including scores
,[Average_Update].[Updated]
FROM [dbo].[Average] [Average]
FULL OUTER JOIN [dbo].[Average_Update] [Average_Update] ON ([Average_Update].[Id] = [Average].UpdateDateId)
--The meat of joining data for accurate calculations
FULL OUTER JOIN (
SELECT DISTINCT S.[Id], S.[LastName], S.[FirstName], S.[ExamDate], S.[RawScoreStandard], S.[RawScorePercent], S.[AdjustedScore], S.[Subject], P.[Id] AS PeriodId
FROM [StudentScore] S
FULL OUTER JOIN
(
--Get only the 1st attempt
SELECT DISTINCT [NBOMEId], S2.[Subject], MIN([ExamDate]) AS ExamDate
FROM [StudentScore] S2
GROUP BY [NBOMEId],S2.[Subject]
) B
ON S.[NBOMEId] = B.[NBOMEId] AND S.[Subject] = B.[Subject] AND S.[ExamDate] = B.[ExamDate]
--Group in "Exam Periods" based on the list of periods w/ start & end dates in another table.
FULL OUTER JOIN [ExamPeriod] P
ON S.[ExamDate] = P.PeriodStart AND S.[ExamDate] <= P.PeriodEnd
WHERE S.[Subject] = B.[Subject]
GROUP BY P.[Id], S.[Subject], S.[ExamDate], S.[RawScoreStandard], S.[RawScorePercent], S.[AdjustedScore], S.[NBOMEId], S.[NBOMELastName], S.[NBOMEFirstName], S.[SecondYrTake]) [StudentScore]
ON
([StudentScore].PeriodId = [Average_Update].ExamPeriodId
AND [StudentScore].Subject = [Average].Subject
AND [StudentScore].[ExamDate] <= [Average_Update].[Updated])
--End meat
--Joins to pull in relevant data for normalized tables
FULL OUTER JOIN [dbo].[Student] [Student] ON ([StudentScore].[NBOMEId] = [Student].[NBOMEId])
INNER JOIN [dbo].[ExamPeriod] [ExamPeriod] ON ([Average_Update].ExamPeriodId = [ExamPeriod].[Id])
INNER JOIN [dbo].[AcademicYear] [AcademicYear] ON ([ExamPeriod].[AcademicYearId] = [AcademicYear].[Id])
--This will pull only the latest update entry for every academic year.
WHERE [Updated] IN (
SELECT DISTINCT MAX([Updated]) AS MaxDate
FROM [Average_Update]
GROUP BY[ExamPeriodId])
GROUP BY [AcademicYear].[AcademicYearText], [Average].[Subject], [Average_Update].[Updated],
ORDER BY [AcademicYear].[AcademicYearText], [Average_Update].[Updated], [Average].[Subject]
I couldn't download your file to test with your data, but try reversing the order of taking the average ie
average(IF DATE(ATTR([Exam Date])) <= DATE(ATTR([Averages (Tableau Test Scores)].[Updated]) then [Raw Score]) END)
as written, I believe you'll be averaging the data before returning it from the if statement, whereas you want to return the data, then average it.

How to write row_number() over clause in linq to Entity Framework

Using Entity Framework and LINQ, I need to get row number together with an entity, e.g. for loan I have multiple invoices and I want to select specific invoice together with its sequence number.
Basically, I'd need to know how to write equivalent to this:
select
nr, i.*
from
[Invoices] i
inner join
(select
row_number() over (order by IssueDate) nr, id
from
[Invoices]
where
LoanId = 5) t on t.id = i.id
where
i.id = 207
According to this post ROW_NUMBER is not supported in L2E. If you don't mind the overhead of loading all invoices for a given LoanId into memory, then it can be be done easily in C# with the overload of the Select method on IEnumerable that produces an index for each item, e.g.:
//First select the invoices
var invoices = from i in dbContext.Invoices
where i.LoanId == 5
order by i.IssueDate
select i;
var indexedInvoice = invoices.ToList().Select((i, count) => new { Invoice = i, RowCount = count })
.First(ii => ii.Invoice.Id == 207);
I can see how this can be less than optimal in some situations, so you might consider bypassing L2E here and execute your query as a plain old sql string, depending on how performance-critical it is, and on how many invoices there usually are for a single loan.

MySQL select rows where date not between date

I have a booking system in which I need to select any available room from the database. The basic setup is:
table: room
columns: id, maxGuests
table: roombooking
columns: id, startDate, endDate
table: roombooking_room:
columns: id, room_id, roombooking_id
I need to select rooms that can fit the requested guests in, or select two (or more) rooms to fit the guests in (as defined by maxGuests, obviously using the lowest/closet maxGuests first)
I could loop through my date range and use this sql:
SELECT `id`
FROM `room`
WHERE `id` NOT IN
(
SELECT `roombooking_room`.`room_id`
FROM `roombooking_room`, `roombooking`
WHERE `roombooking`.`confirmed` =1
AND DATE(%s) BETWEEN `roombooking`.`startDate` AND `roombooking`.`endDate`
)
AND `room`.`maxGuests`>=%d
Where %$1 is the looped date and %2d is the number of guests to be booked in. But this will just return false if there are more guests than any room can take, and there must be a quicker way of doing this rather than looping with php and running the query?
This is similar to part of the sql I was thinking of: Getting Dates between a range of dates but with Mysql
Solution, based on ircmaxwell's answer:
$query = sprintf(
"SELECT `id`, `maxGuests`
FROM `room`
WHERE `id` NOT IN
(
SELECT `roombooking_room`.`room_id`
FROM `roombooking_room`
JOIN `roombooking` ON `roombooking_room`.`roombooking_id` = `roombooking`.`id`
WHERE `roombooking`.`confirmed` =1
AND (`roomBooking`.`startDate` > DATE(%s) OR `roomBooking`.`endDate` < DATE(%s))
)
AND `maxGuests` <= %d ORDER BY `maxGuests` DESC",
$endDate->toString('yyyy-MM-dd'), $startDate->toString('yyyy-MM-dd'), $noGuests);
$result = $db->query($query);
$result = $result->fetchAll();
$rooms = array();
$guests = 0;
foreach($result as $res) {
if($guests >= $noGuests) break;
$guests += (int)$res['maxGuests'];
$rooms[] = $res['id'];
}
Assuming that you are interested to place #Guests from #StartDate to #EndDate
SELECT DISTINCT r.id,
FROM room r
LEFT JOIN roombooking_room rbr ON r.id = rbr.room_id
LEFT JOIN roombooking ON rbr.roombooking_id = rb.id
WHERE COALESCE(#StartDate NOT BETWEEN rb.startDate AND rb.endDate, TRUE)
AND COALESCE(#EndDate NOT BETWEEN rb.startDate AND rb.endDate, TRUE)
AND #Guests < r.maxGuests
should give you a list of all rooms that are free and can accommodate given number of guests for the given period.
NOTES
This query works only for single rooms, if you want to look at multiple rooms you will need to apply the same criteria to a combination of rooms. For this you would need recursive queries or some helper tables.
Also, COALESCE is there to take care of NULLs - if a room is not booked at all it would not have any records with dates to compare to, so it would not return completely free rooms. Date between date1 and date2 will return NULL if either date1 or date2 is null and coalesce will turn it to true (alternative is to do a UNION of completely free rooms; which might be faster).
With multiple rooms things get really interesting.
Is that scenario big part of your problem? And which database are you using i.e. do you have access to recursive queries?
EDIT
As I stated multiple times before, your way of looking for a solution (greedy algorithm that looks at the largest free rooms first) is not the optimal if you want to get the best fit between required number of guests and rooms.
So, if you replace your foreach with
$bestCapacity = 0;
$bestSolution = array();
for ($i = 1; $i <= pow(2,sizeof($result))-1; $i++) {
$solutionIdx = $i;
$solutionGuests = 0;
$solution = array();
$j = 0;
while ($solutionIdx > 0) :
if ($solutionIdx % 2 == 1) {
$solution[] = $result[$j]['id'];
$solutionGuests += $result[$j]['maxGuests'];
}
$solutionIdx = intval($solutionIdx/2);
$j++;
endwhile;
if (($solutionGuests <= $bestCapacity || $bestCapacity == 0) && $solutionGuests >= $noGuests) {
$bestCapacity = $solutionGuests;
$bestSolution = $solution;
}
}
print_r($bestSolution);
print_r($bestCapacity);
Will go through all possible combinations and find the solution that wastes the least number of spaces.
Ok, first off, the inner query you're using is a cartesian join, and will be VERY expensive. You need to specify join criteria (roombooking_room.booking_id = roombooking.id for example).
Secondly, assuming that you have a range of dates, what can we say about that? Well, let's call the start of your range rangeStartDate and rangeEndDate.
Now, what can we say about any other range of dates that does not have any form of overlap with this range? Well, the endDate must not be between be either the rangeStartDate and the rangeEndDate. Same with the startDate. And the rangeStartDate (and rangeEndDate, but we don't need to check it) cannot be between startDate and endDate...
So, assuming %1$s is rangeStartDate and %2$s is rangeEndDate, a comprehensive where clause might be:
WHERE `roomBooking`.`startDate` NOT BETWEEN %1$s AND %2s
AND `roomBooking`.`endDate` NOT BETWEEN %1$s AND %2$$s
AND %1s NOT BETWEEN `roomBooking`.`startDate` AND `roomBooking`.`endDate`
But, there's a simpler way of saying that. The only way for a range to be outside of another is for the start_date to be after the end_date, or the end_date be before the start_id
So, assuming %1$s is rangeStartDate and %2$s is rangeEndDate, another comprehensive where clause might be:
WHERE `roomBooking`.`startDate` > %2$s
OR `roomBooking`.`endDate` < %1$s
So, that brings your overall query to:
SELECT `id`
FROM `room`
WHERE `id` NOT IN
(
SELECT `roombooking_room`.`room_id`
FROM `roombooking_room`
JOIN `roombooking` ON `roombooking_room`.`roombooking_id` = `roombooking`.`id`
WHERE `roombooking`.`confirmed` =1
AND (`roomBooking`.`startDate` > %2$s
OR `roomBooking`.`endDate` < %1$s)
)
AND `room`.`maxGuests`>=%d
There are other ways of doing this as well, so keep looking...
SELECT rooms.id
FROM rooms LEFT JOIN bookings
ON booking.room_id = rooms.id
WHERE <booking overlaps date range of interest> AND <wherever else>
GROUP BY rooms.id
HAVING booking.id IS NULL
I might be miss remembering how left join works so you might need to use a slightly different condition on the having, maybe a count or a sum.
At worst, with suitable indexes, that should scan half the bookings.