MIN function in SQL not working as expected - postgresql

I have a table that lists players of a game and the history of their level changes within the game (beginner, intermediate, advanced, elite, pro). I am trying to build a query that will accurately identify people who reached a level of advanced or higher for the first time so far in 2021 (it is possible for players to skip ranks, so I want the earliest date that they reached either he advanced, elite, or pro levels). I'm currently using this query:
SELECT *
FROM (
SELECT p."ID", ra."NewRank", MIN(ra."EffectiveDate") AS "first_time_adv"
FROM rankachievement ra
JOIN player p
ON ra."ID" = p."ID"
WHERE ra."NewRank" IN ('Advanced',
'Elite',
'Pro')
GROUP BY 1, 2) AS t
WHERE t."first_time_adv" > '1/1/2021'
ORDER BY 1
Right now, this is pulling in all of the people who reached the advanced level for the first time in 2021, but it is also pulling in some people that had previously reached advanced in 2020 and have now achieved an even higher level- I don't want to include these people, which is why I had used MIN with the date.
For example, it is correctly pulling in Players 1 and 2, who both reached advanced on January 2nd, and player 4, who reached elite on January 4th after skipping over the advanced level (went straight from intermediate to elite). But it is also pulling in Player 3, who reached advanced on December 30th, 2020 and then reached elite on January 10th- I do not want player 3 to be included because they reached advanced for the first time before 2021. How can I get my query to exclude people like this?

You're getting two results for Player 3... one with the advanced and one with the elite because you're grouping by NewRank. The one where Player 3 reached advanced gets removed from the result set by your WHERE t.first_time_adv > '1/1/2021' and the elite passes through. I suggest trying to use a FILTER and OVER with MIN().
Your results from the inner query include something like this:
id | new_rank | MIN(EffectiveDate)
---+----------+-------------------
1 | advanced | the min date for this record (12/30/2020)
1 | elite | the min date for this record (01/10/2021)
This is because you're getting the MIN while grouping both ID AND NewRank. You want the MIN over ALL records of that player. If you grouped by only ID you might get the behavior you were looking for but that would require you to remove the NewRank from the SELECT clause.
I suspect you want the rank in your final result set so try something like this:
WITH data AS
(
SELECT p."ID"
, ra."NewRank"
, ra."EffectiveDate"
, MIN(ra."EffectiveDate")
FILTER (WHERE ra."NewRank" = 'Advanced')
OVER (PARTITION BY p."ID") AS "first_time_adv"
FROM rankachievement ra
JOIN player p ON ra."ID" = p."ID"
WHERE ra."NewRank" IN ('Advanced', 'Elite', 'Pro')
)
SELECT *
FROM data d
WHERE d."first_time_adv" > '1/1/2021'
ORDER BY 1
;
Previously you were finding the MIN(EffectiveDate) for any rank IN ('Advanced', 'Elite', 'Pro')... now you're truly finding the EffectiveDate for when a player reached 'Advanced'.

Related

Creating a column that returns date based on various conditions

Context: I'm fairly new to coding as a whole and is learning SQL. This is one of my practice/training session
I'm trying to create a Dimension Table called "Employee Info" using the Adventureworks2019 public Database. Below is my attempt query to fetch all the data needed for this table.
SELECT
e.BusinessEntityID AS EmployeeID,
EEKey = ROW_NUMBER() OVER(ORDER BY(SELECT NULL)),
p.FirstName,
p.MiddleName,
p.LastName,
p.PersonType,
e.Gender,
e.JobTitle,
ep.Rate,
ep.PayFrequency,
e.BirthDate,
e.HireDate,
ep.RateChangeDate AS PayFrom,
e.MaritalStatus
From HumanResources.Employee AS e FULL JOIN
Person.Person AS p ON p.BusinessEntityID = e.BusinessEntityID FULL JOIN
Person.BusinessEntityAddress AS bea ON bea.BusinessEntityID = e.BusinessEntityID FULL JOIN
HumanResources.EmployeePayHistory AS ep ON ep.BusinessEntityID = e.BusinessEntityID
Where
PersonType='SP'
OR PersonType='EM'
ORDER BY EmployeeID;
Query result
Each employee (EE for short) will have a unique [EmployeeID]. The [EEKey] is simply used to mark ordinal numbers of each record.
EEs are paid different rates shown in the [Rate] column. There will be duplicate records if any EE receives a change in his/her pay rate.
There is currently a [PayFrom] column indicating the first date a pay rate is being applied to each record.
Current requirements: Create a [PayTo] column on the right of [PayFrom] to return the last date each EE is getting paid their corresponding pay rate. There should be 2 scenarios:
If the EE being checked has multiple records, meaning his/her pay rate was adjusted at some point. [PayTo] will return the [PayFrom] date of the next record minus 1 day.
If the EE being checked does not have any additional record indicating pay rate changes. [PayTo] will return a fixed day that was specified (Say 31/12/2070)
Example:
[EmployeeID] no. 4 - Rob Walters with 3 consecutive records in Line 4,5,6. In Line 4, the [PayTo] column is expected to return the [PayFrom] date of Line 5 minus 1 day (2010-05-30). The same rule should be applied for Line 5, returning (2011-12-14).
As for Line 6, since there is no additional similar record to fetch data from, it will return the specified date (2070-12-31), using the same rule as every single-record EE.
As I have mentioned, I am a fresher and completely new to coding, so my interpretation and method might be off. If you can kindly point out what I'm doing wrong or show me what should I do to solve this issue, it will be much appreciated.

postgresSQL How to do a SELECT clause with an condition iterating through a range of values?

Hy everyone. This is my first post on Stack Overflow so sorry if it is clumpsy in any way.
I work in Python and make postgresSQL requests to a google BigQuery database. The data structure looks like this :
sample of data
where time is represented in nanoseconds, and is not regularly spaced (it is captured real-time).
What I want to do is to select, say, the mean price over a minute, for each minute in a time range that i would like to give as a parameter.
This time range is currently a list of timestamps that I build externally, and I make sure they are separated by one minute each :
[1606170420000000000, 1606170360000000000, 1606170300000000000, 1606170240000000000, 1606170180000000000, ...]
My question is : how can I extract this list of mean prices given that list of time intervals ?
Ideally I'd expect something like
SELECT AVG(price) OVER( PARTITION BY (time BETWEEN time_intervals[i] AND time_intervals[i+1] for i in range(len(time_intervals))) )
FROM table_name
but I know that doesn't make sense...
My temporary solution is to aggregate many SELECT ... UNION DISTINCT clauses, one for each minute interval. But as you can imagine, this is not very efficient... (I need up to 60*24 = 1440 samples)
Now there very well may already be an answer to that question, but since I'm not even sure about how to formulate it, I found nothing yet. Every link and/or tip would be of great help.
Many thanks in advance.
First of all, your sample data appears to be at millisecond resolution, and you are looking for averages at minute (sixty-second) resolution.
Please try this:
select div(time, 60000000000) as minute,
pair,
avg(price) as avg_price
from your_table
group by div(time, 60000000000) as minute, pair
If you want to control the intervals as you said in your comment, then please try something like this (I do not have access to BigQuery):
with time_ivals as (
select tick,
lead(tick) over (order by tick) as next_tick
from unnest(
[1606170420000000000, 1606170360000000000,
1606170300000000000, 1606170240000000000,
1606170180000000000, ...]) as tick
)
select t.tick, y.pair, avg(y.price) as avg_price
from time_ivals t
join your_table y
on y.time >= t.tick
and y.time < t.next_tick
group by t.tick, y.pair;

Get latest one record will show, 1 week after its been added. After that it will show randomly upon page refresh

My requirement is simple , I have a repeater control webpart, and I want to apply a condition in the WHERE clause.
Condition : The latest one record will show, 1 week after its been added. After that it will show randomly upon page refresh.
Means if record is more than 1 week , then it will show latest by , upon page refresh.
I made this query but it doesn't work:
(DocumentCreatedWhen >= dateadd(day, -7, convert(date, getdate())))
I'm a little confused on the "on page refresh" portion of your request. You said in your first part that "after that it will show randomly upon page refresh" then on the 2nd part said "if the record is greater than 1 week, it will show upon page refresh"
Which do you want?
To filter out events that are at least 1 week old, you would do
DATEDIFF(day, DocumentCreatedWhen , GETDATE()) >= 7
from there you can do an ORDER BY DocumentCreateWhen asc, and a Top # of 1.
If you want to apply different logic on postback, you can use macros and the Visibility to make the "random" repeater visible on post back, and the other visible if it's not postback, or use macros to provide different WHERE conditions based on the postback status.
I could not find a default "IsPostback" macro available so you will have to create a custom macro that returns the current postback status.
Try these settings on your data source:
ORDER BY expression: age DESC, NEWID()
WHERE condition: DateDiff(day,DocumentCreatedWhen, GetDate()) >= 7
Columns: CASE WHEN DateDiff(day,DocumentCreatedWhen,GetDate()) = 7 THEN 1 ELSE 0 END AS age, *
This should mean that any document that is 7 days old appears at the top of your list, ready for you to set Select top N pages to 1. All other documents more than 7 days old will just be ordered randomly by the NEWID() function.
Obviously, where the * is in the columns, you should specify the columns that you need rather than leaving in a wildcard for performance reasons.
I just ran this out on the Dancing Goat sample and it does what you need (assuming I've understood correctly).
Edit:
Worth noting. Anything that is 7 days old will stay there until it is... well.. not exactly 7 days old. To make that work, you would somehow need track that the record has been shown so that you can then exclude it from the results set. I.e. you COLUMNS become something like this:
CASE
WHEN (DateDiff(day,DocumentCreatedWhen,GetDate()) = 7 AND DocumentHasBeenShown=0 THEN 1
ELSE 0
END) AS age
, *
What You need to use is union
SELECT TOP 1 * FROM
(
-- Get the latest record
Union
-- Get random record
) as Result
For example, if you get MenuItem:
SELECT TOP 1 * FROM
(
-- latest for this week
SELECT DocumentUrlPath, DocumentName, DocumentCreatedWhen from (
select top 1 DocumentNAme, DocumentUrlPAth, DocumentCreatedWhen FROM View_CONTENT_MenuItem_Joined
where DATEDIFF(day, DocumentCreatedWhen , GETDATE()) <= 7 Order BY DocumentCreatedWhen DESC) as LatestForThisWeek
UNION
-- random
SELECT DocumentUrlPath, DocumentName, DocumentCreatedWhen from (
select top 1 DocumentNAme, DocumentUrlPAth, DocumentCreatedWhen FROM View_CONTENT_MenuItem_Joined
ORDER BY NEWID() ) as RandomizedRecords
) as Result
There is lots of sub queries but this will give you the idea :)

Select if last 6 visits happened within 30 days

Using Crystal 2013, my report groups by customer_id. The details are the times(datetime) that the customer has visited. I've figured out how to detect a minimum of 6 visits, but I am want to check if those 6 visits happened within 30 days.
I want to show all of the visits for the customer, but I want to only show the groups that meet the criteria. Do I need to do a Whileprintingrecords to do a datediff between the first and 6th record for each group? How can I do this?
Here is what I have:
30 >= DateDiff ("DD", (if {Command.ROW} = 1 then {Command.Visit_START},
(if {Command.ROW} = 6 then{Command.Visit_STOP}))
Are you able to sort the group in Descending order? I have an idea but it will be less work for you if they're grouped newest to oldest.
WhileReadingRecords:
In each group you'll need to determine the 1st and 6th visit. (You're currently suppressing any groups with less than 6.) To do this, I would make a Shared Variable called Counter that increments by one every record and resets every time it reaches a new group. (Set it to zero in the Group Header.)
Next you'll need two more Shared Variables called FirstDate and SixthDate. These populate with the date value if Counter equals one or six respectively. Just like Counter you'll reset these every time the group changes.
WhilePrintingRecords:
If everything works, you should now have the two dates values you need for calculations. Add an additional clause in your current Suppression formula:
....
AND
DateDiff("d", FirstDate, SixthDate)

Calculating change in leaders for baseball stats in MSSQL

Imagine I have a MSSQL 2005 table(bbstats) that updates weekly showing
various cumulative categories of baseball accomplishments for a team
week 1
Player H SO HR
Sammy 7 11 2
Ted 14 3 0
Arthur 2 15 0
Zach 9 14 3
week 2
Player H SO HR
Sammy 12 16 4
Ted 21 7 1
Arthur 3 18 0
Zach 12 18 3
I wish to highlight textually where there has been a change in leader for each category
so after week 2 there would be nothing to report on hits(H); Zach has joined Arthur with most strikeouts(SO) at
18; and Sammy is new leader in homeruns(HR) with 4
So I would want to set up a process something like
a) save the past data(week 1) as table bbstatsPrior,
b) updates the bbstats for the new results - I do not need assistance with this
c) compare between the tables for the player(s with ties) with max value for each column
and spits out only where they differ
d) move onto next column and repeat
In any real world example there would be significantly more columns to calculate for
Thanks
Responding to Brents comments, I am really after any changes in the leaders for each category
So I would have something like
select top 1 with ties player
from bbstatsPrior
order by H desc
and
select top 1 with ties player,H
from bbstats
order by H desc
I then want to compare the player from each query (do I need to do temp tables) . If they differ I want to output the second select statement. For the H category Ted is leader `from both tables but for other categories there are changes between the weeks
I can then loop through the columns using
select name from sys.all_columns sc
where sc.object_id=object_id('bbstats') and name <>'player'
If the number of stats doesn't change often, you could easily just write a single query to get this data. Join bbStats to bbStatsPrior where bbstatsprior.week < bbstats.week and bbstats.week=#weekNumber. Then just do a simple comparison between bbstats.Hits to bbstatsPrior.Hits to get your difference.
If the stats change often, you could use dynamic SQL to do this for all columns that match a certain pattern or are in a list of columns based on sys.columns for that table?
You could add a column for each stat column to designate the leader using a correlated subquery to find the max value for that column and see if it's equal to the current record.
This might get you started, but I'd recommend posting what you currently have to achieve this and the community can help you from there.