How to use a table function outputting 2 columns in an update statement - tsql

Pseudocode:
Get all projects
Use a table function to get all related parts, which uses project id as input and returns 0..* part ids
Copy a value from project to all found part ids
Datamodel:
Table projects consists of fields pj_id and pj_desc
Table parts consists of fields pj_desc_copy and prt_id
There's a function LookupRelationShips(string) that outputs multiple columns (rel_type and rel_id, where if rel_type = 2, rel_id would be a prt_id
My best attempt is this, but it won't let me use the output of the subselect:
UPDATE parts
SET pj_desc_copy = rel.pj_desc
from parts prt
INNER JOIN
(select (select rel_type, rel_id, pj.pj_desc
from LookupRelationShips(pj.pj_id)
where rel_type = 2)
from projects pj) as rel
ON rel.rel_id = prt.prt_id
Use case/restrictions:
This is a one-time statement to update all current parts. From this point onwards project CRUD will result in syncing parts, but using the application to bulk update previous projects is less than ideal (built-in timeouts, lots of overhead, large dataset).

I think your query should be as follow. You can use CROSS APPLY() on the function
UPDATE prt
SET pj_desc_copy = rel.pj_desc
FROM parts prt
INNER JOIN projects pj ON pj.rel_id = prt.prt_id
CROSS APPLY LookupRelationShips(pj.pj_id) rel
WHERE rel.rel_type = 2

UPDATE parts
SET pj_desc_copy = pj.pj_desc
FROM projects pj
CROSS APPLY LookupRelationShips(pj.pj_id) rel
RIGHT JOIN parts prt on rel.rel_id = prt.prt_id
WHERE rel.rel_type = 2

Related

How does PostgreSQL interpret these two join statements?

I have a question between two very similar PostgreSQL statements:
UPDATE classes SET year = 1
FROM professors WHERE (professors.class = classes.class)
AND professors.name = 'Smith'`
This one seems to inner join the classes table and the professors table, and update only the record in classes where the corresponding professor's name is Smith.
UPDATE classes c SET year = 1
FROM classes cl JOIN professors on (professors.class_id = cl.class_id)
WHERE professors.name = 'Smith'`
This updates every single record in classes. Why is this statement different from the first one?
In the second, you are referring to classes twice. These are two separate references, and the c and cl references are not correlated. In fact, there are no conditions on c, so all rows are updated.
You could add a correlation condition:
UPDATE classes
SET year = 1
FROM classes cl JOIN
professors p
ON p.class_id = cl.class_id
WHERE p.name = 'Smith' AND cl.class_id = classes.class_id;
However, the JOIN is unnecessary and the first query is a better approach (for this purpose).

Replace correlated subquery with join

I'd like to replace the following ABAP OpenSQL snippet (in the where clause of a much bigger statement) with an equivalent join.
... AND tf~tarifart = ( SELECT MAX( tf2~tarifart ) FROM ertfnd AS tf2 WHERE tf2~tariftyp = e1~tariftyp AND tf2~bis >= e1~bis AND tf2~ab <= e1~ab ) ...
My motivation: Query migration to ABAP CDS views (basically plain SQL with in comparison somewhat reduced expressiveness). Alas, correlated subqueries and EXISTS statements are not supported.
I googled a bit and found a possible solution (last post) here https://archive.sap.com/discussions/thread/3824523
However, the proposal
Selecting MAX(value)
Your scenarion using inner join to first CDS view
doesn't work in my case.
tf.bis (and tf.ab) need to be in the selection list of the new view to limit the rhs of the join (new view) to the correct time frames.
Alas, there could be multiple (non overlapping) sub time frames (contained within [tf.ab, tf.bis]) with the same tf.tarifart.
Since these couldn't be grouped together, this results in multiple rows on the rhs.
The original query does not have a problem with that (no join -> no Cartesian product).
I hope the following fiddle (working example) clears things up a bit: http://sqlfiddle.com/#!9/8d1f48/3
Given these constraints, to me it seems that an equivalent join is indeed impossible. Suggestions or even confirmations?
select doc_belzart,
doc_tariftyp,
doc_ab,
doc_bis,
max(tar_tarifart)
from
(
select document.belzart as doc_belzart,
document.tariftyp as doc_tariftyp,
document.ab as doc_ab,
document.bis as doc_bis,
tariff.tarifart as tar_tarifart,
tariff.tariftyp as tar_tariftyp,
tariff.ab as tar_ab,
tariff.bis as tar_bis
from dberchz1 as document
inner join ertfnd as tariff
on tariff.tariftyp = document.tariftyp and
tariff.ab <= document.ab and
tariff.bis >= document.bis
) as max_tariff
group by doc_belzart,
doc_tariftyp,
doc_ab,
doc_bis
Translated in English, you seem to want to determine the max applicable tariff for a set of documents.
I'd refactor this into separate steps:
Determine all applicable tariffs, meaning all tariffs that completely cover the document's time interval. This will become your first CDS view, and in my answer forms the sub-query.
Determine for all documents the max applicable tariff. This will form your second CDS view, and in my answer forms the outer query. This one has the MAX / GROUP BY to reduce the result set to one per document.

Construct SQL where clause to pull data within Power Query

Basically I want to retrieve rows of data that meet my clause conditions using Power Query.
I got 400 rows of lookup values in my spreadsheet.
Each row represent 1 lookup code for example, code AAA1, AAB2 and so on
So lets say I have a select statement and I want to construct the where clauses using the above codes so my end sql statement will look like
select * from MyTable where Conditions in ('AA1', 'AAB2')
so so far I have this
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Form ID",
Int64.Type}}),
test = Sql.Database("myserver", "myDB", [Query="SELECT * FROM myTable where" & #"Changed Type" & "])"
in
test
Obviously that didnt work but thats my pseduo scenario anyway.
Please could you advice what to do?
Thank you
Peddie
I would create a "lookup" Power Query based on the Excel table. I would set the "Load To" properties to "Only Create Connection".
Then I would start the main Query by connecting to the SQL server using the Navigator to select "MyTable". Then I would add a Merge step to the main Query, to join to the "lookup" Query, matching the "Conditions" column to the "lookup" code. I would set the Join Type to "Inner". The Merge properties window will show you visually if the 2 columns you select actually contain matching data.
This approach does not require any coding, and is easier to build, extend and maintain.
Mike Honey's join is best for your problem, but here's a more general solution if you find yourself needing other logic in your where clause.
Normally Power query only generates row filters on an equality expression, but you can put any code you want in a Table.SelectRows filter, like each List.Contains({"AA1", "AAB2"}, [Conditions])
So for your table, your query would look something like:
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Form ID", Int64.Type}}),
test = Sql.Database("myserver", "myDB"),
yourTable = test{[Name="myTable"]}[Data],
filtered = Table.SelectRows(yourTable, each List.Contains(#"Changed Type"[Form ID], [Conditions]))
in
filtered
The main downside to using the library functions is that Table.SelectRows only knows how to generate SQL where clauses for specific expression patterns, so the row filter probably runs on your machine after downloading the whole table, instead of having the Sql Server run the filter.

Thinking Sphinx indexing performance

I have a large index definition that takes too long to index. I suspect the main problem is caused by the many LEFT OUTER JOINs generated.
I saw this question, but can't find documentation about using source: :query, which seems to be part of the solution.
My index definition and the resulting query can be found here: https://gist.github.com/jonsgold/fdd7660bf8bc98897612
How can I optimize the generated query to run faster during indexing?
The 'standard' sphinx solution to this would be to use ranged queries.
http://sphinxsearch.com/docs/current.html#ex-ranged-queries
... splitting up the query into lots of small parts, so the database server has a better chance of being able to run the query (rather than one huge query)
But I have no idea how to actully enable that in Thinking Sphinx. Can't see anything in the documentation. Could help you edit the sphinx.conf, but also not sure how TS will cope with you manually editing the config file.
This is the solution that worked best (from the linked question). Basically, you can remove a piece of the main query sql_query and define it separately as a sql_joined_field in the sphinx.conf file.
It's important to add all relevant sql conditions to each sql_joined_field (such as sharding indexes by modulo on the ID). Here's the new definition:
ThinkingSphinx::Index.define(
:incident,
with: :active_record,
delta?: false,
delta_processor: ThinkingSphinx::Deltas.processor_for(ThinkingSphinx::Deltas::ResqueDelta)
) do
indexes "SELECT incidents.id * 51 + 7 AS id, sites.name AS site FROM incidents LEFT OUTER JOIN sites ON sites.id = site_id WHERE incidents.deleted = 0 AND EXISTS (SELECT id FROM accounts WHERE accounts.status = 'enabled' AND incidents.account_id = id) ORDER BY id", as: :site, source: :query
...
has
...
end
ThinkingSphinx::Index.define(
:incident,
with: :active_record,
delta?: true,
delta_processor: ThinkingSphinx::Deltas.processor_for(ThinkingSphinx::Deltas::ResqueDelta)
) do
indexes "SELECT incidents.id * 51 + 7 AS id, sites.name AS site FROM incidents LEFT OUTER JOIN sites ON sites.id = site_id WHERE incidents.deleted = 0 AND incidents.delta = 1 AND EXISTS (SELECT id FROM accounts WHERE accounts.status = 'enabled' AND incidents.account_id = id) ORDER BY id", as: :site, source: :query
...
has
...
end
The magic that defines the field site as a separate query is the option source: :query at the end of the line.
Notice the core index definition has the parameter delta?: false, while the delta index definition has the parameter delta?: true. That's so I could use the condition WHERE incidents.delta = 1 in the delta index and filter out irrelevant records.
I found sharding didn't perform any better, so I reverted to one unified index.
See the whole index definition here: https://gist.github.com/jonsgold/05e2aea640320ee9d8b2.
Important to remember!
The Sphinx document ID offset must be handled manually. That is, whenever an index for another model is added or removed, my calculated document ID will change. This must be updated.
So, in my example, if I added an index for a different model (not :incident), I would have to run rake ts:configure to find out my new offset and change incidents.id * 51 + 7 accordingly.

Transact-SQL Ambiguous column name

I'm having trouble with the 'Ambiguous column name' issue in Transact-SQL, using the Microsoft SQL 2012 Server Management Studio.
I´ve been looking through some of the answers already posted on Stackoverflow, but they don´t seem to work for me, and parts of it I simply don´t understand or loses the general view of.
Executing the following script :
USE CDD
SELECT Artist, Album_title, track_title, track_number, Release_Year, EAN_code
FROM Artists AS a INNER JOIN CD_Albumtitles AS c
ON a.artist_id = c.artist_id
INNER JOIN Track_lists AS t
ON c.title_id = t.title_id
WHERE track_title = 'bohemian rhapsody'
triggers the following error message :
Msg 209, Level 16, State 1, Line 3
Ambiguous column name 'EAN_code'.
Not that this is a CD database with artists names, album titles and track lists. Both the tables 'CD_Albumtitles' and 'Track_lists' have a column, with identical EAN codes. The EAN code is an important internationel code used to uniquely identify CD albums, which is why I would like to keep using it.
You need to put the alias in front of all the columns in your select list and your where clause. You're getting that error because one of the columns you have currently is coming from multiple tables in your join. If you alias the columns, it will essentially pick one or the other of the tables.
SELECT a.Artist,c.Album_title,t.track_title,t.track_number,c.Release_Year,t.EAN_code
FROM Artists AS a INNER JOIN CD_Albumtitles AS c
ON a.artist_id = c.artist_id
INNER JOIN Track_lists AS t
ON c.title_id = t.title_id
WHERE t.track_title = 'bohemian rhapsody'
so choose one of the source tables, prefixing the field with the alias (or table name)
SELECT Artist,Album_title,track_title,track_number,Release_Year,
c.EAN_code -- or t.EAN_code, which should retrieve the same value
By the way, try to prefix all the fields (in the select, the join, the group by, etc.), it's easier for maintenance.