CASE in JOIN not working PostgreSQL - postgresql

I got the following tables:
Teams
Matches
I want to get an output like:
matches.semana | teams.nom_equipo | teams.nom_equipo | Winner
1 AMERICA CRUZ AZUL AMERICA
1 SANTOS MORELIA MORELIA
1 LEON CHIVAS LEON
The columns teams.nom_equipo reference to matches.num_eqpo_lo & to matches.num_eqpo_v and at the same time they reference to the column teams.nom_equipo to get the name of each team based on their id
Edit: I have used the following:
SELECT m.semana, t_loc.nom_equipo AS LOCAL, t_vis.nom_equipo AS VISITANTE,
CASE WHEN m.goles_loc > m.goles_vis THEN 'home'
WHEN m.goles_vis > m.goles_loc THEN 'visitor'
ELSE 'tie'
END AS Vencedor
FROM matches AS m
JOIN teams AS t_loc ON (m.num_eqpo_loc = t_loc.num_eqpo)
JOIN teams AS t_vis ON (m.num_eqpo_vis = t_vis.num_eqpo)
ORDER BY m.semana;
But as you can see from the table Matches in row #5 from the goles_loc column (home team) & goles_vis (visitor) column, they have 2 vs 2 (number of goals - home vs visitor) being a tie but and when I run the code I get something that is not a tie:
Matches' score
Resultset from Select:
I also noticed that since the row #5 the names of both teams in the matches are not correct (both visitor & home team).
So, the Select brings correct data but in other order different than the original order (referring to the order from the table matches)
The order from the second week must be:
matches.semana | teams.nom_equipo | teams.nom_equipo | Winner
5 2 CRUZ AZUL TOLUCA TIE
6 2 MORELIA LEON LEON
7 2 CHIVAS SANTOS TIE
Row 8 from the Resultset must be Row # 5 and so on.
Any help would be really thanked!

When doing a SELECT which includes null for a column, that's the value it will always be, so winner in your case will never be populated.
Something like this is probably more along the lines of what you want:
SELECT m.semana, t_loc.nom_equipo AS loc_equipo, t_vis.nom_equipo AS vis_equipo,
CASE WHEN m.goles_loc - m.goles_vis > 0 THEN t_loc.nom_equipo
WHEN m.goles_vis - m.goles_loc > 0 THEN t_vis.nom_equipo
ELSE NULL
END AS winner
FROM matches AS m
JOIN teams AS t_loc ON (m.nom_eqpo_loc = t.num_eqpo)
JOIN teams AS t_vis ON (m.nom_eqpo_vis = t.num_eqpo)
ORDER BY m.semana;
Untested, but this should provide the general approach. Basically, you JOIN to the teams table twice, but using different conditions, and then you need to calculate the scores. I'm using NULL to indicate a tie, here.
Edit in response to comment from OP:
It's the same table -- teams -- but the JOINs produce different results, because the query uses different JOIN conditions in each JOIN.
The first JOIN, for t_loc, compares m.nom_eqpo_loc to t.num_eqpo. This means it gets the teams rows for the home team.
The second JOIN, for t_vis, compares m.nom_eqpo_vis to t.num_eqpo. This means it gets the teams rows for the visting team.
Therefore, in the CASE statement, t_loc refers to the home team, while t_vis refers to the visting one, enabling both to be used in the CASE statement, enabling the correct name to be found for winning.
Edit in response to follow-up comment from OP:
My original query was sorting by m.semana, which means other columns can appear in any order (essentially whichever Postgres feels is most efficient).
If you want the resulting table to be sorted exactly the same way as the matches table, then use the same ORDER BY tuple in its ORDER BY.
So, the ORDER BY clause would then become:
ORDER BY m.semana, m.nom_eqpo_loc, m.nom_eqpo_vis
Basically, the matches table PRIMARY KEY tuple.

Related

Replace correlated subquery with join

I'd like to replace the following ABAP OpenSQL snippet (in the where clause of a much bigger statement) with an equivalent join.
... AND tf~tarifart = ( SELECT MAX( tf2~tarifart ) FROM ertfnd AS tf2 WHERE tf2~tariftyp = e1~tariftyp AND tf2~bis >= e1~bis AND tf2~ab <= e1~ab ) ...
My motivation: Query migration to ABAP CDS views (basically plain SQL with in comparison somewhat reduced expressiveness). Alas, correlated subqueries and EXISTS statements are not supported.
I googled a bit and found a possible solution (last post) here https://archive.sap.com/discussions/thread/3824523
However, the proposal
Selecting MAX(value)
Your scenarion using inner join to first CDS view
doesn't work in my case.
tf.bis (and tf.ab) need to be in the selection list of the new view to limit the rhs of the join (new view) to the correct time frames.
Alas, there could be multiple (non overlapping) sub time frames (contained within [tf.ab, tf.bis]) with the same tf.tarifart.
Since these couldn't be grouped together, this results in multiple rows on the rhs.
The original query does not have a problem with that (no join -> no Cartesian product).
I hope the following fiddle (working example) clears things up a bit: http://sqlfiddle.com/#!9/8d1f48/3
Given these constraints, to me it seems that an equivalent join is indeed impossible. Suggestions or even confirmations?
select doc_belzart,
doc_tariftyp,
doc_ab,
doc_bis,
max(tar_tarifart)
from
(
select document.belzart as doc_belzart,
document.tariftyp as doc_tariftyp,
document.ab as doc_ab,
document.bis as doc_bis,
tariff.tarifart as tar_tarifart,
tariff.tariftyp as tar_tariftyp,
tariff.ab as tar_ab,
tariff.bis as tar_bis
from dberchz1 as document
inner join ertfnd as tariff
on tariff.tariftyp = document.tariftyp and
tariff.ab <= document.ab and
tariff.bis >= document.bis
) as max_tariff
group by doc_belzart,
doc_tariftyp,
doc_ab,
doc_bis
Translated in English, you seem to want to determine the max applicable tariff for a set of documents.
I'd refactor this into separate steps:
Determine all applicable tariffs, meaning all tariffs that completely cover the document's time interval. This will become your first CDS view, and in my answer forms the sub-query.
Determine for all documents the max applicable tariff. This will form your second CDS view, and in my answer forms the outer query. This one has the MAX / GROUP BY to reduce the result set to one per document.

Sphinx Mulit-Level Sort with Randomize

Here is my challenge with Sphinx Sort where I have Vendors who pay for premium placement and those who don't:
I already do a multi-level order including the PaidVendorStatus which is either 0 or 1 as:
order by PaidVendorStatus,Weight()
So in essence I end up with multiple sort groups:
PaidVendorStatus=1, Weight1
....
PaidVendorStatus=1, WeightN
PaidVendorStatus=0, Weight1
...
PaidVendorStatus=0, WeightN
The problem is I have three goals:
Randomly prioritize each vendor in any given sort group
Have each vendor's 'odds' of being randomly assigned top position be equal regardless of how many records they have returned in the group (so if Vendor A has 50 results and VendorB has 2 results they still both have 50% odds of being randomly assigned any given spot)
Ideally, maintain the same results order in any given search (so that if the user searches again the same order will be displayed
I've tried various solutions:
Select CRC32(Vendor) as RANDOM...Order by PaidVendorStatus,Weight(),RANDOM
which solves 2 and 3 except due to the nature of CRC32 ALWAYS puts the same vendor first (and second, third, etc.) so in essence does not solve the issue at all.
I tried making a sphinx sql_attr_string in my Sphinx Configuration which was a concatenation of Vendor and the record Title (Select... concat(Vendor,Title) as RANDOMIZER..)` and then used that to randomize
Select CRC32(RANDOMIZER) as RANDOM...
which solves 1 and 3 as now the Title field gets thrown in the randomization mis so that the same Vendor does not always get first billing. However, it fails at 2 since in essence I am only sorting by Title and thus Vendor B with two results now has a very low change of being sorted first.
In an ideal world naturally I could just order this way;
Order by PaidVendorStatus,Weight(),RAND(Vendor)
but that is not possible.
Any thoughts on this appreciated. I did btw check out as per Barry Hunter's suggestion this thread on UDF but unless I am not understanding it at all (possible) it does not seem to be the solution for this problem.
Well one idea is:
SELECT * FROM (
SELECT *,uniqueserial(vendor_id) AS sorter FROM index WHERE MATCH(...)
ORDER BY PaidVendorStatus DESC ,Weight() DESC LIMIT 1000
) ORDER BY sorter DESC, WEIGHT() DESC:
This exploits SPhixnes 'multiple sort' function with pysudeo subquery.
This works wors becasuse the inner query is sorted by PaidVendor first, so their items are fist. Which works to affect the ordr that unqique serial is called in.
Its NOT really 'randomising' the results as such, seems you jsut randomising them to mix up the vendors (so a single vendor doesnt domninate results. Uniqueserial works by 'spreading' the particular vendors results out - the results will tend to cycle through the vendors.
This is tricky as it exploits a relative undocumented sphinx feature - subqueries.
For the UDF see http://svn.geograph.org.uk/svn/modules/trunk/sphinx/
Still dont have an answer for your biased random (as in 2.)
but just remembered another feature taht can help with 3. - can supply s specific seed to the random number. Typically random generators are seeded from the current time, which gives ever changing values, But using a specific seed.
Seed is however a number, so need a predictable, but changing number. Could CRC the query?
... sphinx doesnt support expressions in the OPTION so would have to caculate the hash in the app.
<?php
$query = $db->Quote($_GET['q']);
$crc = crc32($query);
$sql = "SELECT id,IDIV(WEIGHT(),100) as i,RAND() as r FROM index WHERE MATCH($query)
ORDER BY PaidVendorStatus DESC,i DESC,r ASC OPTION random_seed=$crc";
If wanted the results to only slowly evolve, add the current date, so each day is a new selection...
$crc = crc32($query.date('Ymd'));

Fastest way to update a Postgres table, given a set of unique column values?

I've been running into this same issue repeatedly when trying to execute Postgres updates. First I'll run a SELECT query, like so:
SELECT stock_number
FROM products
WHERE available = true
EXCEPT
SELECT stock_number
FROM new_inventory_list;
This selects the stock numbers of all products that indicate that they're available in the current database, but no longer appear in the new list of inventory that's just been downloaded. This command runs very quickly. However, virtually any method I use to update this list seems to take at least ten minutes to run, slowing the server down in the process. For instance:
UPDATE products
SET available = false
WHERE stock_number IN (
SELECT stock_number
FROM products
WHERE available = true
AND stock_number IS NOT NULL
EXCEPT
SELECT stock_number
FROM new_inventory_list
);
There are usually at least 10,000 rows that need to be updated, and often a lot more if a supplier pushes a lot of new inventory at once. Additionally, we need to check for price updates. It's relatively fast and easy to get a list of stock numbers for products that have been changed in price:
WITH overlap AS (
SELECT stock_number
FROM products
INTERSECT
SELECT stock_number
FROM new_inventory_list
)
unchanged AS (
SELECT stock_number, price
FROM products
INTERSECT
SELECT stock_number, price
FROM new_inventory_list
)
SELECT * FROM overlap EXCEPT SELECT stock FROM unchanged;
For this query, I don't even try to use SQL commands to do it, instead I pull the list out into a script, then run UPDATE on each modified value individually. It's slow, but still seems to be faster than any command I've tried that was strictly in SQL. Plus, with an external script, I can output the progress periodically, so I approximate how long it will run for. Stock numbers are unique, although they're occasionally NULL. (Those should be ignored)
I feel like there has to be a much faster way of doing this, but so far I haven't had any luck figuring it out. Any thoughts?
edit:
I think I found a better solution to this problem than any that I've tried so far:
WITH removed AS (
SELECT stock_number
FROM products
WHERE available = true
EXCEPT
SELECT stock_number
FROM new_inventory_list
)
UPDATE products AS p
SET available = false
FROM removed
WHERE removed.stock_number = p.stock_number;
I hadn't considered the idea of using UPDATE and WITH together, and didn't even know it was possible until I read the UPDATE documentation for Postgres. Even though it's considerably faster, it still takes a few minutes to run, so to monitor it, I just run the above command in a loop, with LIMIT 1000 at the end of the SELECT clause, printing a message to the console every time it successfully updates another block.
This query:
WITH removed AS (
SELECT stock_number
FROM products
WHERE available = true
EXCEPT
SELECT stock_number
FROM new_inventory_list
)
UPDATE products AS p
SET available = false
FROM removed
WHERE removed.stock_number = p.stock_number;
… will, I trust, do a superfluous join on the entire table with itself. And probably a poorly performing one, at that, because of the except clause in the with statement.
Think of it this way: suppose a products table with a million rows, around 250k marked as available, and 50k of those that don't appear in a 200k-item strong inventory list. The with query runs like this: 1) find the 50k rows in products that need to be updated; 2) then, for each row in products, check if the id is in those 50k rows in order to re-select those same 50k rows; 3) and update the row.
For improved performance, the update query should select the candidate rows from products that need to be updated directly, and use an anti-join to eliminate unwanted rows. The query #wildplasser posted earlier seems fine:
UPDATE products dst
SET available = false
WHERE available
AND NOT EXISTS (
SELECT 1
FROM new_inventory_list nx
WHERE nx.stock_number = dst.stock_number
);
Another point is the "about 50 columns, 20 of which are indexed" you mentioned in the comments: That will slow down updates considerable. Imagine: each row that gets updated needs to be written into not just that table, but in an additional 20 tables. Are you sure this shouldn't be normalized a bit more and that you actually need each of those indexes?
Have you tried
WITH removed AS (
SELECT stock_number
FROM products p1
LEFT JOIN new_inventory_list n1
ON p1.stock_number=n1.stock_number
WHERE p1.available AND n1.stock_number IS NULL
)
I don't know how the EXCEPT is being done; perhaps this will retain some indexing for use in the UPDATE. Also, if available is usually false, I would add a partial index
CREATE INDEX product_available ON product(stock_number) WHERE available;

Adding a constraint to a Splunk search yields more result rows

Consider the following two Splunk searches:
index=a | join type=inner MyKey [search index=b]
and:
index=a | join type=inner MyKey [search index=b | where MyVal > 0]
Remarkably, the latter of the searches - the search whose subsearch has a constraint - has three times as many result rows as the former.
The Splunk documentation page for the join command suggests semantics that are close enough for the sake of argument to SQL's join:
A join is used to combine the results of a search and subsearch if specified fields are common to each. You can also join a table to itself using the selfjoin command.
This snippet is relevant to the type=inner argument:
A join is used to combine the results of a search and subsearch if specified fields are common to each. You can also join a table to itself using the selfjoin command.
Based on this information, I assume the two Splunk searches above should be equivalent to the following SQL, respectively:
SELECT *
FROM a
INNER JOIN b ON a.MyKey = b.MyKey
and:
SELECT *
FROM a
INNER JOIN b ON a.MyKey = b.MyKey
WHERE b.MyVal > 0
How is it possible that adding a constraint increases the number of result rows?
Interestingly, the following Splunk search produces a third result - one that matches what I got when I put the same data in an SQL database:
index=a | join type=outer MyKey [search index=b | eval hasmatch=1]
| where hasmatch=1
Some more notes:
the MyVal field has no duplicates in either table / index
I have verified that the raw events in Splunk's indexes match the raw source data in event counts and values for MyVal
the only search-time operations configured for the relevant sourcetypes in props.conf is a report to extract the fields based on a stanza in transforms.conf (the source data is in a CSV dialect)
Can anyone give me some clues here? As far as I'm concerned this behaviour is nonsensical.
I would suggest that you re-post this question over at the official Splunk forums (splunk-base.splunk.com). There are some very useful and experience "Splunkers" there (Employees, customers, partners, etc.) and it may be a better place for your question.
Cheers,
MHibbin

SQL Select rows by comparison of value to aggregated function result

I have a table listing (gameid, playerid, team, max_minions) and I want to get the players within each team that have the lowest max_minions (within each team, within each game). I.e. I want a list (gameid, team, playerid_with_lowest_minions) for each game/team combination.
I tried this:
SELECT * FROM MinionView GROUP BY gameid, team
HAVING MIN(max_minions) = max_minions;
Unfortunately, this doesn't seem to work as it seems to select a random row from the available rows for each (gameid, team) and then does the HAVING comparison. If the randomly selected row doesn't match, it's simply skipped.
Using WHERE won't work either since you can't use aggregate functions within WHERE clauses.
LIMIT won't work since I have many more games and LIMIT limits the total number of rows returned.
Is there any way to do this without adding another table/view that contains (gameid, teamid, MIN(max_minions))?
Example data:
sqlite> SELECT * FROM MinionView;
gameid|playerid|team|champion|max_minions
21|49|100|Champ1|124
21|52|100|Champ2|18
21|53|100|Champ3|303
21|54|200|Champ4|356
21|57|200|Champ5|180
21|58|200|Champ6|21
64|49|100|Champ7|111
64|50|100|Champ8|208
64|53|100|Champ9|8
64|54|200|Champ0|226
64|55|200|ChampA|182
64|58|200|ChampB|15
...
Expected result (I mostly care about playerid, but included champion, max_minions here for better overview):
21|52|100|Champ2|18
21|58|200|Champ6|21
64|53|100|Champ9|8
64|58|200|ChampB|15
...
I'm using Sqlite3 under Python 3.1 if that matters.
This is in SQL Server, hopefully the syntax works for you too:
SELECT
MV.*
FROM
(
SELECT
team, gameid, min(max_minions) as maxmin
FROM
MinionView
GROUP BY
team, gameid
) groups
JOIN MinionView MV ON
MV.team = groups.team
AND MV.gameid = groups.gameid
AND MV.max_minions = groups.maxmin
In words, first you make the usual grouping query (the nested one). At this point you have the min value for each group but you don't know to which row it belongs. For this you join with the original table and match the "keys" (team, game and min) to get the other columns as well.
Note that if a team will have more than one member with the same value for max_minions then all these rows will be selected. If you only want one of them then that's probably a bit more complicated.