I'm writing a T-SQL database to help me identify trends with football (soccer) statistics.
I've got several rows of data per match, each row signifying an event such as a goal, or a corner.
I'm trying to work out who wins the corner races for each match to 3, 5, 7 & 9 corners.
I just cannot even think of a method to employ for this. Can anyone help?
I've tried looking at case statements and temp tables, but just can't find a way to get what I want
The data would look something like this
Event Team Time (mins)
corner home 32
corner home 78
corner away 86
corner home 12
corner away 89
corner away 36
So I would want to find the 3rd, 5th, 7th and 9th instance (if they exist) of each home corner and the same for the away corners. The lower of the two times for each category would therefore win the race if that makes sense.
In the above example the Home team would win the race because it's third corner occurred in the 78th minute, whereas the third away corner was in the 89th minute.
With this query:
select e.*,
(select count(*) from events
where team = e.team and event = e.event and time < e.time) + 1 counter
from events e
order by e.time
you get in chronological order all the events and for each event a counter up to that time. You can use it like this:
select top 1 t.team
from (
select e.*,
(select count(*) from events
where team = e.team and event = e.event and time < e.time) + 1 counter
from events e
) t
where event = 'corner' and counter = 3
order by t.time
to get the winning team of the 3 corners.
See the demo.
Something like this for the Home team:
SELECT TOP 9 * FROM Your_Table ROW_NUMBER() OVER(ORDER BY Time) AS RowNum WHERE RowNum = 3 OR RowNum = 5 OR RowNum = 7 OR RowNum = 9 AND Team = 'Home'
You'd create something similar for the away team...
Related
I have a table that lists players of a game and the history of their level changes within the game (beginner, intermediate, advanced, elite, pro). I am trying to build a query that will accurately identify people who reached a level of advanced or higher for the first time so far in 2021 (it is possible for players to skip ranks, so I want the earliest date that they reached either he advanced, elite, or pro levels). I'm currently using this query:
SELECT *
FROM (
SELECT p."ID", ra."NewRank", MIN(ra."EffectiveDate") AS "first_time_adv"
FROM rankachievement ra
JOIN player p
ON ra."ID" = p."ID"
WHERE ra."NewRank" IN ('Advanced',
'Elite',
'Pro')
GROUP BY 1, 2) AS t
WHERE t."first_time_adv" > '1/1/2021'
ORDER BY 1
Right now, this is pulling in all of the people who reached the advanced level for the first time in 2021, but it is also pulling in some people that had previously reached advanced in 2020 and have now achieved an even higher level- I don't want to include these people, which is why I had used MIN with the date.
For example, it is correctly pulling in Players 1 and 2, who both reached advanced on January 2nd, and player 4, who reached elite on January 4th after skipping over the advanced level (went straight from intermediate to elite). But it is also pulling in Player 3, who reached advanced on December 30th, 2020 and then reached elite on January 10th- I do not want player 3 to be included because they reached advanced for the first time before 2021. How can I get my query to exclude people like this?
You're getting two results for Player 3... one with the advanced and one with the elite because you're grouping by NewRank. The one where Player 3 reached advanced gets removed from the result set by your WHERE t.first_time_adv > '1/1/2021' and the elite passes through. I suggest trying to use a FILTER and OVER with MIN().
Your results from the inner query include something like this:
id | new_rank | MIN(EffectiveDate)
---+----------+-------------------
1 | advanced | the min date for this record (12/30/2020)
1 | elite | the min date for this record (01/10/2021)
This is because you're getting the MIN while grouping both ID AND NewRank. You want the MIN over ALL records of that player. If you grouped by only ID you might get the behavior you were looking for but that would require you to remove the NewRank from the SELECT clause.
I suspect you want the rank in your final result set so try something like this:
WITH data AS
(
SELECT p."ID"
, ra."NewRank"
, ra."EffectiveDate"
, MIN(ra."EffectiveDate")
FILTER (WHERE ra."NewRank" = 'Advanced')
OVER (PARTITION BY p."ID") AS "first_time_adv"
FROM rankachievement ra
JOIN player p ON ra."ID" = p."ID"
WHERE ra."NewRank" IN ('Advanced', 'Elite', 'Pro')
)
SELECT *
FROM data d
WHERE d."first_time_adv" > '1/1/2021'
ORDER BY 1
;
Previously you were finding the MIN(EffectiveDate) for any rank IN ('Advanced', 'Elite', 'Pro')... now you're truly finding the EffectiveDate for when a player reached 'Advanced'.
I am trying to pivot using crosstab function and unable to achieve for the requirement. Is there is a way to perform crosstab dynamically and also dynamic result set?
I have tried using crosstab built-in function and unable to meet my requirement.
select * from crosstab ('select item,cd, type, parts, part, cnt
from item
order by 1,2')
AS results (item text,cd text, SUM NUMERIC, AVG NUMERIC);
Sample Data:
ITEM CD TYPE PARTS PART CNT
Item 1 A AVG 4 1 10
Item 1 B AVG 4 2 20
Item 1 C AVG 4 3 30
Item 1 D AVG 4 4 40
Item 1 A SUM 4 1 10
Item 1 B SUM 4 2 20
Item 1 C SUM 4 3 30
Item 1 D SUM 4 4 40
Expected Results:
ITEM CD PARTS TYPE_1 CNT_1 TYPE_1 CNT_1 TYPE_2 CNT_2 TYPE_2 CNT_2 TYPE_3 CNT_3 TYPE_3 CNT_3 TYPE_4 CNT_4 TYPE_4 CNT_4
Item 1 A 4 AVG 10 SUM 10 AVG 20 SUM 20 AVG 30 SUM 30 AVG 40 SUM 40
The PARTS value is based on a parameter passed by the user. If the user passes 2 for example, there will be 4 rows in the result set (2 parts for AVG and 2 parts of SUM).
Can I achieve this requirement using CROSSTAB function or is there a custom SQL statement that need to be developed?
I'm not following your data, so I can't offer examples based on it. But I have been looking at pivot/cross-tab features over the past few days. I was just looking at dynamic cross tabs just before seeing your post. I'm hoping that your question gets some good answers, I'll start off with a bit of background.
You can use the crosstab extension for standard cross tabs, what when wrong when you tried it? Here's an example I wrote for myself the other day with a bunch of comments and aliases for clarity. The pivot is looking at item scans to see where the scans were "to", like the warehouse or the floor.
/* Basic cross-tab example for crosstab (text) format of pivot command.
Notice that the embedded query has to return three columns, see the aliases.
#1 is the row label, it shows up in the output.
#2 is the category, what determines how many columns there are. *You have to work this out in advance to declare them in the return.*
#3 is the cell data, what goes in the cross tabs. Note that this form of the crosstab command may return NULL, and coalesce does not work.
To get rid of the null count/sums/whatever, you need crosstab (text, text).
*/
select *
from crosstab ('select
specialty_name as row_label,
scanned_to as column_splitter,
count(num_inst)::numeric as cell_data
from scan_table
group by 1,2
order by 1,2')
as scan_pivot (
row_label citext,
"Assembly" numeric,
"Warehouse" numeric,
"Floor" numeric,
"QA" numeric);
As a manual alternative, you can use a series of FILTER statements. Here's an example that summaries errors_log records by day of the week. The "down" is the error name, the "across" (columns) are the days of the week.
select "error_name",
count(*) as "Overall",
count(*) filter (where extract(dow from "updated_dts") = 0) as "Sun",
count(*) filter (where extract(dow from "updated_dts") = 1) as "Mon",
count(*) filter (where extract(dow from "updated_dts") = 2) as "Tue",
count(*) filter (where extract(dow from "updated_dts") = 3) as "Wed",
count(*) filter (where extract(dow from "updated_dts") = 4) as "Thu",
count(*) filter (where extract(dow from "updated_dts") = 5) as "Fri",
count(*) filter (where extract(dow from "updated_dts") = 6) as "Sat"
from error_log
where "error_name" is not null
group by "error_name"
order by 1;
You can do the same thing with CASE, but FILTER is easier to write.
It looks like you want something basic, maybe the FILTER solution appeals? It's easier to read than calls to crosstab(), since that was giving you trouble.
FILTER may be slower than crosstab. Probably. (The crosstab extension is written in C, and I'm not sure how smart FILTER is about reading off indexes.) But I'm not sure as I haven't tested it out yet. (It's on my to do list, but I haven't had time yet.) I'd be super interested if anyone can offer results. We're on 11.4.
I wrote a client-side tool to build FILTER-based pivots over the past few days. You have to supply the down and across fields, an aggregate formula and the tool spits out the SQL. With support for coalesce for folks who don't want NULL, ROLLUP, TABLESAMPLE, view creation, and some other stuff. It was a fun project. Why go to that effort? (Apart from the fun part.) Because I haven't found a way to do dynamic pivots that I actually understand. I love this quote:
"Dynamic crosstab queries in Postgres has been asked many times on SO all involving advanced level functions/types. Consider building your needed query in application layer (Java, Python, PHP, etc.) and pass it in a Postgres connected query call. Recall SQL is a special-purpose, declarative type while app layers are general-purpose, imperative types." – Parfait
So, I wrote a tool to pre-calculate and declare the output columns. But I'm still curious about dynamic options in SQL. If that's of interest to you, have a look at these two items:
https://postgresql.verite.pro/blog/2018/06/19/crosstab-pivot.html
Flatten aggregated key/value pairs from a JSONB field?
Deep magic in both.
I am using t-sql.
I have 4 work trays and I would like a report that gives me the name of each work tray, plus the oldest item of post in it, plus a couple more fields. It needs to be limited to 4 rows - one for each work tray.
So at the moment I have this:
SELECT WorkTray, MIN(Date) AS [OldestDate], RefNo, NameofItem
FROM ...
GROUP BY WorkTray,RefNo, NameofItem
ORDER BY WorkTray,RefNo, NameofItem
However when I run this it gives me every item in each work tray, eg a report 100s of items long - I just want it to be limited to 4 rows of data, one for each work tray:
Work Tray Date RefNo NameofItem
A 1/2/15 25 Outstanding Bill
B 5/5/18 1000 Lost post
C 2/2/12 17 Misc
D 6/12/17 876 Misc
So I'm sure I'm going wrong somewhere with my GROUP BY - but I can't see where.
There is a trick for doing this that has been answered on stackoverflow before. Here it is adapted to your query:
SELECT *
FROM
(SELECT WorkTray, Date AS [OldestDate], RefNo, NameofItem, ROW_NUMBER() OVER (PARTITION BY WorkTray ORDER BY WorkTray, [Date]) AS rn
FROM MyTable
) GroupedByTray
WHERE rn = 1
The PARTITION BY tells it to count the rows for each type of tray, and the ORDER BY works similar to the normal ORDER BY clause. Assuming you have only 4 work trays (A - D), the "WHERE rn = 1" part will return only the first row for WorkTrays A - D.
My requirement is simple , I have a repeater control webpart, and I want to apply a condition in the WHERE clause.
Condition : The latest one record will show, 1 week after its been added. After that it will show randomly upon page refresh.
Means if record is more than 1 week , then it will show latest by , upon page refresh.
I made this query but it doesn't work:
(DocumentCreatedWhen >= dateadd(day, -7, convert(date, getdate())))
I'm a little confused on the "on page refresh" portion of your request. You said in your first part that "after that it will show randomly upon page refresh" then on the 2nd part said "if the record is greater than 1 week, it will show upon page refresh"
Which do you want?
To filter out events that are at least 1 week old, you would do
DATEDIFF(day, DocumentCreatedWhen , GETDATE()) >= 7
from there you can do an ORDER BY DocumentCreateWhen asc, and a Top # of 1.
If you want to apply different logic on postback, you can use macros and the Visibility to make the "random" repeater visible on post back, and the other visible if it's not postback, or use macros to provide different WHERE conditions based on the postback status.
I could not find a default "IsPostback" macro available so you will have to create a custom macro that returns the current postback status.
Try these settings on your data source:
ORDER BY expression: age DESC, NEWID()
WHERE condition: DateDiff(day,DocumentCreatedWhen, GetDate()) >= 7
Columns: CASE WHEN DateDiff(day,DocumentCreatedWhen,GetDate()) = 7 THEN 1 ELSE 0 END AS age, *
This should mean that any document that is 7 days old appears at the top of your list, ready for you to set Select top N pages to 1. All other documents more than 7 days old will just be ordered randomly by the NEWID() function.
Obviously, where the * is in the columns, you should specify the columns that you need rather than leaving in a wildcard for performance reasons.
I just ran this out on the Dancing Goat sample and it does what you need (assuming I've understood correctly).
Edit:
Worth noting. Anything that is 7 days old will stay there until it is... well.. not exactly 7 days old. To make that work, you would somehow need track that the record has been shown so that you can then exclude it from the results set. I.e. you COLUMNS become something like this:
CASE
WHEN (DateDiff(day,DocumentCreatedWhen,GetDate()) = 7 AND DocumentHasBeenShown=0 THEN 1
ELSE 0
END) AS age
, *
What You need to use is union
SELECT TOP 1 * FROM
(
-- Get the latest record
Union
-- Get random record
) as Result
For example, if you get MenuItem:
SELECT TOP 1 * FROM
(
-- latest for this week
SELECT DocumentUrlPath, DocumentName, DocumentCreatedWhen from (
select top 1 DocumentNAme, DocumentUrlPAth, DocumentCreatedWhen FROM View_CONTENT_MenuItem_Joined
where DATEDIFF(day, DocumentCreatedWhen , GETDATE()) <= 7 Order BY DocumentCreatedWhen DESC) as LatestForThisWeek
UNION
-- random
SELECT DocumentUrlPath, DocumentName, DocumentCreatedWhen from (
select top 1 DocumentNAme, DocumentUrlPAth, DocumentCreatedWhen FROM View_CONTENT_MenuItem_Joined
ORDER BY NEWID() ) as RandomizedRecords
) as Result
There is lots of sub queries but this will give you the idea :)
Imagine I have a MSSQL 2005 table(bbstats) that updates weekly showing
various cumulative categories of baseball accomplishments for a team
week 1
Player H SO HR
Sammy 7 11 2
Ted 14 3 0
Arthur 2 15 0
Zach 9 14 3
week 2
Player H SO HR
Sammy 12 16 4
Ted 21 7 1
Arthur 3 18 0
Zach 12 18 3
I wish to highlight textually where there has been a change in leader for each category
so after week 2 there would be nothing to report on hits(H); Zach has joined Arthur with most strikeouts(SO) at
18; and Sammy is new leader in homeruns(HR) with 4
So I would want to set up a process something like
a) save the past data(week 1) as table bbstatsPrior,
b) updates the bbstats for the new results - I do not need assistance with this
c) compare between the tables for the player(s with ties) with max value for each column
and spits out only where they differ
d) move onto next column and repeat
In any real world example there would be significantly more columns to calculate for
Thanks
Responding to Brents comments, I am really after any changes in the leaders for each category
So I would have something like
select top 1 with ties player
from bbstatsPrior
order by H desc
and
select top 1 with ties player,H
from bbstats
order by H desc
I then want to compare the player from each query (do I need to do temp tables) . If they differ I want to output the second select statement. For the H category Ted is leader `from both tables but for other categories there are changes between the weeks
I can then loop through the columns using
select name from sys.all_columns sc
where sc.object_id=object_id('bbstats') and name <>'player'
If the number of stats doesn't change often, you could easily just write a single query to get this data. Join bbStats to bbStatsPrior where bbstatsprior.week < bbstats.week and bbstats.week=#weekNumber. Then just do a simple comparison between bbstats.Hits to bbstatsPrior.Hits to get your difference.
If the stats change often, you could use dynamic SQL to do this for all columns that match a certain pattern or are in a list of columns based on sys.columns for that table?
You could add a column for each stat column to designate the leader using a correlated subquery to find the max value for that column and see if it's equal to the current record.
This might get you started, but I'd recommend posting what you currently have to achieve this and the community can help you from there.