Stuck with T-SQL Query - tsql

I'm having big troubles with a more complicated (or my mind makes it so) SQL query. Ill try to explain as best as I can what is needed.
This is how it looks like essentially: http://i.imgur.com/obWY7.png
The idea is that there will be sport games where kids team up with pro's and they have a tournament. Now everything works fine untill I get to the score part.
I need a query that return a game's outcome in 4 colomns:
winchildname, winproname, losechildname, loseproname.
This way its easy to display in my C# project. If i use this query:
SELECT * FROM Games WHERE g_win = 1 OR g_lose = 1
This returns every game played by child with the k_id 1 but the outcome is in ints instead actual text. I know how to inner join 1 int so it becomes text, but not 2. Because g_win and g_lose are both a k_id out of the child table.
Is there someone that can give me a hint, because I feel that im overthinking this big time.

You can join multiple times on the same table by giving the table an alias:
WITH Names AS (
SELECT k_id AS id, k_name, p_name
FROM Child
JOIN Pro ON k_pro = p_id
)
SELECT win.k_name AS win_k_name, win.p_name AS win_p_name, lose.k_name AS win_k_name, lose.p_name AS lose_p_name
FROM Games
JOIN Names AS win ON g_win = win.id
JOIN Names AS lose ON g_lose = lose.id

Related

Does a "check whether data is ordered" aggregate function exist in Postgres?

I have a simple GROUP BY query in Postgres:
SELECT id,sum(score)
FROM games
GROUP BY teams
HAVING (array_agg(score ORDER BY kw.tag_num DESC))[1] = max(score)
ORDER BY sum DESC
This will return the teams with the top total scores. Additionally, it will only include teams whose final game of the season was their best. For instance, array_agg = {1,2,3,4,5} is allowed but array_agg={1,2,3,4,2} will be omitted.
But what I really want is to filter by teams who continually improved as the season progressed. So, my above query is actually a bit of a hack.
How can I make sure that array_agg = {1,2,3,4,5} is allowed but array_agg = {1,2,3,2,5} is omitted?
Speed is of utmost importance to me. I'd rather stick with a fast "hack" than to get what I really want but it to end up being too slow.
Thanks in advance!

Transact-SQL Ambiguous column name

I'm having trouble with the 'Ambiguous column name' issue in Transact-SQL, using the Microsoft SQL 2012 Server Management Studio.
I´ve been looking through some of the answers already posted on Stackoverflow, but they don´t seem to work for me, and parts of it I simply don´t understand or loses the general view of.
Executing the following script :
USE CDD
SELECT Artist, Album_title, track_title, track_number, Release_Year, EAN_code
FROM Artists AS a INNER JOIN CD_Albumtitles AS c
ON a.artist_id = c.artist_id
INNER JOIN Track_lists AS t
ON c.title_id = t.title_id
WHERE track_title = 'bohemian rhapsody'
triggers the following error message :
Msg 209, Level 16, State 1, Line 3
Ambiguous column name 'EAN_code'.
Not that this is a CD database with artists names, album titles and track lists. Both the tables 'CD_Albumtitles' and 'Track_lists' have a column, with identical EAN codes. The EAN code is an important internationel code used to uniquely identify CD albums, which is why I would like to keep using it.
You need to put the alias in front of all the columns in your select list and your where clause. You're getting that error because one of the columns you have currently is coming from multiple tables in your join. If you alias the columns, it will essentially pick one or the other of the tables.
SELECT a.Artist,c.Album_title,t.track_title,t.track_number,c.Release_Year,t.EAN_code
FROM Artists AS a INNER JOIN CD_Albumtitles AS c
ON a.artist_id = c.artist_id
INNER JOIN Track_lists AS t
ON c.title_id = t.title_id
WHERE t.track_title = 'bohemian rhapsody'
so choose one of the source tables, prefixing the field with the alias (or table name)
SELECT Artist,Album_title,track_title,track_number,Release_Year,
c.EAN_code -- or t.EAN_code, which should retrieve the same value
By the way, try to prefix all the fields (in the select, the join, the group by, etc.), it's easier for maintenance.

Fastest way to update a Postgres table, given a set of unique column values?

I've been running into this same issue repeatedly when trying to execute Postgres updates. First I'll run a SELECT query, like so:
SELECT stock_number
FROM products
WHERE available = true
EXCEPT
SELECT stock_number
FROM new_inventory_list;
This selects the stock numbers of all products that indicate that they're available in the current database, but no longer appear in the new list of inventory that's just been downloaded. This command runs very quickly. However, virtually any method I use to update this list seems to take at least ten minutes to run, slowing the server down in the process. For instance:
UPDATE products
SET available = false
WHERE stock_number IN (
SELECT stock_number
FROM products
WHERE available = true
AND stock_number IS NOT NULL
EXCEPT
SELECT stock_number
FROM new_inventory_list
);
There are usually at least 10,000 rows that need to be updated, and often a lot more if a supplier pushes a lot of new inventory at once. Additionally, we need to check for price updates. It's relatively fast and easy to get a list of stock numbers for products that have been changed in price:
WITH overlap AS (
SELECT stock_number
FROM products
INTERSECT
SELECT stock_number
FROM new_inventory_list
)
unchanged AS (
SELECT stock_number, price
FROM products
INTERSECT
SELECT stock_number, price
FROM new_inventory_list
)
SELECT * FROM overlap EXCEPT SELECT stock FROM unchanged;
For this query, I don't even try to use SQL commands to do it, instead I pull the list out into a script, then run UPDATE on each modified value individually. It's slow, but still seems to be faster than any command I've tried that was strictly in SQL. Plus, with an external script, I can output the progress periodically, so I approximate how long it will run for. Stock numbers are unique, although they're occasionally NULL. (Those should be ignored)
I feel like there has to be a much faster way of doing this, but so far I haven't had any luck figuring it out. Any thoughts?
edit:
I think I found a better solution to this problem than any that I've tried so far:
WITH removed AS (
SELECT stock_number
FROM products
WHERE available = true
EXCEPT
SELECT stock_number
FROM new_inventory_list
)
UPDATE products AS p
SET available = false
FROM removed
WHERE removed.stock_number = p.stock_number;
I hadn't considered the idea of using UPDATE and WITH together, and didn't even know it was possible until I read the UPDATE documentation for Postgres. Even though it's considerably faster, it still takes a few minutes to run, so to monitor it, I just run the above command in a loop, with LIMIT 1000 at the end of the SELECT clause, printing a message to the console every time it successfully updates another block.
This query:
WITH removed AS (
SELECT stock_number
FROM products
WHERE available = true
EXCEPT
SELECT stock_number
FROM new_inventory_list
)
UPDATE products AS p
SET available = false
FROM removed
WHERE removed.stock_number = p.stock_number;
… will, I trust, do a superfluous join on the entire table with itself. And probably a poorly performing one, at that, because of the except clause in the with statement.
Think of it this way: suppose a products table with a million rows, around 250k marked as available, and 50k of those that don't appear in a 200k-item strong inventory list. The with query runs like this: 1) find the 50k rows in products that need to be updated; 2) then, for each row in products, check if the id is in those 50k rows in order to re-select those same 50k rows; 3) and update the row.
For improved performance, the update query should select the candidate rows from products that need to be updated directly, and use an anti-join to eliminate unwanted rows. The query #wildplasser posted earlier seems fine:
UPDATE products dst
SET available = false
WHERE available
AND NOT EXISTS (
SELECT 1
FROM new_inventory_list nx
WHERE nx.stock_number = dst.stock_number
);
Another point is the "about 50 columns, 20 of which are indexed" you mentioned in the comments: That will slow down updates considerable. Imagine: each row that gets updated needs to be written into not just that table, but in an additional 20 tables. Are you sure this shouldn't be normalized a bit more and that you actually need each of those indexes?
Have you tried
WITH removed AS (
SELECT stock_number
FROM products p1
LEFT JOIN new_inventory_list n1
ON p1.stock_number=n1.stock_number
WHERE p1.available AND n1.stock_number IS NULL
)
I don't know how the EXCEPT is being done; perhaps this will retain some indexing for use in the UPDATE. Also, if available is usually false, I would add a partial index
CREATE INDEX product_available ON product(stock_number) WHERE available;

SQL Select rows by comparison of value to aggregated function result

I have a table listing (gameid, playerid, team, max_minions) and I want to get the players within each team that have the lowest max_minions (within each team, within each game). I.e. I want a list (gameid, team, playerid_with_lowest_minions) for each game/team combination.
I tried this:
SELECT * FROM MinionView GROUP BY gameid, team
HAVING MIN(max_minions) = max_minions;
Unfortunately, this doesn't seem to work as it seems to select a random row from the available rows for each (gameid, team) and then does the HAVING comparison. If the randomly selected row doesn't match, it's simply skipped.
Using WHERE won't work either since you can't use aggregate functions within WHERE clauses.
LIMIT won't work since I have many more games and LIMIT limits the total number of rows returned.
Is there any way to do this without adding another table/view that contains (gameid, teamid, MIN(max_minions))?
Example data:
sqlite> SELECT * FROM MinionView;
gameid|playerid|team|champion|max_minions
21|49|100|Champ1|124
21|52|100|Champ2|18
21|53|100|Champ3|303
21|54|200|Champ4|356
21|57|200|Champ5|180
21|58|200|Champ6|21
64|49|100|Champ7|111
64|50|100|Champ8|208
64|53|100|Champ9|8
64|54|200|Champ0|226
64|55|200|ChampA|182
64|58|200|ChampB|15
...
Expected result (I mostly care about playerid, but included champion, max_minions here for better overview):
21|52|100|Champ2|18
21|58|200|Champ6|21
64|53|100|Champ9|8
64|58|200|ChampB|15
...
I'm using Sqlite3 under Python 3.1 if that matters.
This is in SQL Server, hopefully the syntax works for you too:
SELECT
MV.*
FROM
(
SELECT
team, gameid, min(max_minions) as maxmin
FROM
MinionView
GROUP BY
team, gameid
) groups
JOIN MinionView MV ON
MV.team = groups.team
AND MV.gameid = groups.gameid
AND MV.max_minions = groups.maxmin
In words, first you make the usual grouping query (the nested one). At this point you have the min value for each group but you don't know to which row it belongs. For this you join with the original table and match the "keys" (team, game and min) to get the other columns as well.
Note that if a team will have more than one member with the same value for max_minions then all these rows will be selected. If you only want one of them then that's probably a bit more complicated.

JOIN versus subquery - performance sqlite3

So I got 3 tables:
Locations (LocationID [PK])
Track_Locations (TrackID [FK], LocationID [FK])
Tracks (TrackID [PK])
I'm building an app for iPhone and I was told that a JOIN would probably work faster then a subquery would. Is this true?
I have searched around but can't find a straight answer to this and would like to know what is best to use in this case.
My queries in both cases would look like this:
Subquery:
SELECT LocationID, TimestampGPS, Longitude, Latitude, ...
FROM locations
WHERE LocationID
IN (SELECT LocationID FROM track_locations WHERE TrackID = ?)
Using JOIN:
SELECT l.LocationID, l.TimestampGPS, l.Longitude, l.Latitude, ...
FROM locations l
JOIN track_locations tl
ON (l.LocationID = tl.LocationID)
JOIN tracks t
ON (t.TrackID = tl.TrackID)
WHERE t.TrackID = ?
A JOIN is almost always faster than a subquery, because it results in a single query, whereas a subquery is two queries. Some databases, however, know how to optimize such simple subqueries into JOINs, though I doubt SQLite does.
That said, sometimes, given the structure of one's data, it is faster to do a subquery. The only real way to find out is to load your database with a likely collection of sample data from the wild and benchmark both approaches.
My suspicion is that it will be a wash, however. Most iPhone apps don't have enough rows in the database to make much difference.