Update from existing table in Redshift - postgresql

I would like to update a value in Redshift table from results of other table, I'm trying to run to following query but received an error.
update section_translate
set word=t.section_type
from (
select distinct section_type from mr_usage where section_type like '%sディスコ')t
where word = '80sディスコ'
The error I received:
ERROR: Target table must be part of an equijoin predicate
Can't understand what is incorrect in my query.

You need to make the uncorrelated subquery to a correlated subquery,
update section_translate
set word=t.section_type
from (
select distinct section_type,'80sディスコ' as word from mr_usage where section_type like '%sディスコ')t
where section_translate.word = t.word
Otherwise, each record of the outer query is eligible for updates and the query engine rejects it. The way Postgre (and thus Redshift) evaluates uncorrelated subqueries is slightly different from SQL Server/ Oracle etc.

Related

Pivot function without manually typing values in `for in`?

Documentation provides an example of using the pivot() function.
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN ('prop', 'rudder', 'wing')
);
I would like to use pivot() without having to manually specify each value of partname. I want all parts. I tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname);
That gave an error. Then tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN (select distinct partname from part)
);
That also threw an error.
How can I tell Redshift to include all values of partname in the pivot?
I don't think this can be done in a simple single query. This would mean that the query compiler would need to work without knowing how many output columns will be produced. I don't think it can do that.
You can do this in multiple queries - use a query to create the list of partnames and then use this to "generate" a second query that populates the IN list. So something needs issue these queries and generated the second. This can be some code external to Redshift (lots of options) or a stored procedure in Redshift. This code, no matter where it exists, should understand that Redshift has a max number of columns limit - 1,600.
The Redshift docs are fairly good on the topic of dynamic SQL for stored procedures. The EXECUTE statement will be used to fire off the second query in a stored procedure. See: https://docs.aws.amazon.com/redshift/latest/dg/c_PLpgSQL-statements.html

The sqliite db query is not working in postgresql db

i am having a query which is working correctly in SQLite. but its giving error in PostgreSQL.
SELECT decks.id, decks.name, count(cards.id)
from decks
JOIN cards ON decks.id = cards.did
GROUP BY cards.did
above query is giving error in postgresql.
ERROR: column "decks.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT decks.id, decks.name, count(cards.id) FROM decks JOIN...
You can't have columns in the SELECT list, that are not used in an aggregate function or part of the GROUP BY. The fact that SQLite accepts this, is a bug in SQLite. The fact that Postgres rejects this, is correct.
You need to rewrite your query to:
SELECT decks.id, decks.name, count(cards.id)
from decks
JOIN cards ON decks.id = cards.did
GROUP BY decks.id, decks.name;
If decks.id is the primary key, you can shorten the grouping to GROUP BY decks.id

Optimising computing time on sql query

I usually use the update query to change or update a column in my PostgreSQL database.
I create subqueries from the data in a second schema to integrate it with the column to be updated.
In this code I update the 'unit_source_concept_id' column of the measurement table in the OMOP CDM schema from a table in another schema ('transform_semantic' schema).
with subquery1 as (
select unit_source_concept_id
from transform_semantique.measurement )
update omop.measurement m
set unit_source_concept_id = subquery1.unit_source_concept_id
from subquery1
where measurement_source_concept_id <> 5;
I don't know if the update query is the most suitable and optimal (in terms of computing time). This request took +6000s to execute.
Do you know a method to optimise this query?

IN clause with large list in OpenJpa causing too complex statement

I have to create a named query where I need to group my results by some fields and also using an IN clause to limit my results.
The it looks something like this
SELECT new MyDTO(e.objID) FROM Entity e WHERE e.objId IN (:listOfIDs) GROUP BY e.attr1, e.attr2
I'm using OpenJPA and IBM DB2. In some cases my List of IDs can be very large (>80.000 IDs) and then the generated SQL statement becomes too complex for DB2, because the final generated statement prints out all IDs, like this:
SELECT new MyDTO(e.objID) FROM Entity e WHERE e.objId IN (1,2,3,4,5,6,7,...) GROUP BY e.attr1, e.attr2
Is there any good way to handle this kind of query? A possible Workaround would be to write the IDs in a temporary table and then using the IN clause on this table.
You should put all of the values in a table and rewrite the query as a join. This will not only solve your query problem, it should be more efficient as well.
declare global temporary table ids (
objId int
) with replace on commit preserve rows;
--If this statement is too long, use a couple of insert statements.
insert into session.ids values
(1,2,3,4,....);
select new mydto(e.objID)
from entity e
join session.ids i on
e.objId = i.objId
group by e.attr1, e.attr2;

Cannot sort a row of size 8130, which is greater than the allowable maximum of 8094

SELECT DISTINCT tblJobReq.JobReqId
, tblJobReq.JobStatusId
, tblJobClass.JobClassId
, tblJobClass.Title
, tblJobReq.JobClassSubTitle
, tblJobAnnouncement.JobClassDesc
, tblJobAnnouncement.EndDate
, blJobAnnouncement.AgencyMktgVerbage
, tblJobAnnouncement.SpecInfo
, tblJobAnnouncement.Benefits
, tblSalary.MinRateSal
, tblSalary.MaxRateSal
, tblSalary.MinRateHour
, tblSalary.MaxRateHour
, tblJobClass.StatementEval
, tblJobReq.ApprovalDate
, tblJobReq.RecruiterId
, tblJobReq.AgencyId
FROM ((tblJobReq
LEFT JOIN tblJobAnnouncement ON tblJobReq.JobReqId = tblJobAnnouncement.JobReqId)
INNER JOIN tblJobClass ON tblJobReq.JobClassId = tblJobClass.JobClassId)
LEFT JOIN tblSalary ON tblJobClass.SalaryCode = tblSalary.SalaryCode
WHERE (tblJobReq.JobClassId in (SELECT JobClassId
from tblJobClass
WHERE tblJobClass.Title like '%Family Therapist%'))
When i try to execute the query it results in the following error.
Cannot sort a row of size 8130, which is greater than the allowable maximum of 8094
I checked and didn't find any solution. The only way is to truncate (substring())the "tblJobAnnouncement.JobClassDesc" in the query which has column size of around 8000.
Do we have any work around so that i need not truncate the values. Or Can this query be optimised? Any setting in SQL Server 2000?
The [non obvious] reason why SQL needs to SORT is the DISTINCT keyword.
Depending on the data and underlying table structures, you may be able to do away with this DISTINCT, and hence not trigger this error.
You readily found the alternative solution which is to truncate some of the fields in the SELECT list.
Edit: Answering "Can you please explain how DISTINCT would be the reason here?"
Generally, the fashion in which the DISTINCT requirement is satisfied varies with
the data context (expected number of rows, presence/absence of index, size of row...)
the version/make of the SQL implementation (the query optimizer in particular receives new or modified heuristics with each new version, sometimes resulting in alternate query plans for various constructs in various contexts)
Yet, all the possible plans associated with a "DISTINCT query" involve *some form* of sorting of the qualifying records. In its simplest form, the plan "fist" produces the list of qualifying rows (records) (the list of records which satisfy the WHERE/JOINs/etc. parts of the query) and then sorts this list (which possibly includes some duplicates), only retaining the very first occurrence of each distinct row. In other cases, for example when only a few columns are selected and when some index(es) covering these columns is(are) available, no explicit sorting step is used in the query plan but the reliance on an index implicitly implies the "sortability" of the underlying columns. In other cases yet, steps involving various forms of merging or hashing are selected by the query optimizer, and these too, eventually, imply the ability of comparing two rows.
Bottom line: DISTINCT implies some sorting.
In the specific case of the question, the error reported by SQL Server and preventing the completion of the query is that "Sorting is not possible on rows bigger than..." AND, the DISTINCT keyword is the only apparent reason for the query to require any sorting (BTW many other SQL constructs imply sorting: for example UNION) hence the idea of removing the DISTINCT (if it is logically possible).
In fact you should remove it, for test purposes, to assert that, without DISTINCT, the query completes OK (if only including some duplicates). Once this fact is confirmed, and if effectively the query could produce duplicate rows, look into ways of producing a duplicate-free query without the DISTINCT keyword; constructs involving subqueries can sometimes be used for this purpose.
An unrelated hint, is to use table aliases, using a short string to avoid repeating these long table names. For example (only did a few tables, but you get the idea...)
SELECT DISTINCT JR.JobReqId, JR.JobStatusId,
tblJobClass.JobClassId, tblJobClass.Title,
JR.JobClassSubTitle, JA.JobClassDesc, JA.EndDate, JA.AgencyMktgVerbage,
JA.SpecInfo, JA.Benefits,
S.MinRateSal, S.MaxRateSal, S.MinRateHour, S.MaxRateHour,
tblJobClass.StatementEval,
JR.ApprovalDate, JR.RecruiterId, JR.AgencyId
FROM (
(tblJobReq AS JR
LEFT JOIN tblJobAnnouncement AS JA ON JR.JobReqId = JA.JobReqId)
INNER JOIN tblJobClass ON tblJobReq.JobClassId = tblJobClass.JobClassId)
LEFT JOIN tblSalary AS S ON tblJobClass.SalaryCode = S.SalaryCode
WHERE (JR.JobClassId in
(SELECT JobClassId from tblJobClass
WHERE tblJobClass.Title like '%Family Therapist%'))
FYI, running this SQL command on your DB can fix the problem if it is caused by space that needs to be reclaimed after dropping variable length columns:
DBCC CLEANTABLE (0,[dbo.TableName])
See: http://msdn.microsoft.com/en-us/library/ms174418.aspx
This is a limitation of SQL Server 2000. You can:
Split it into two queries and combine elsewhere
SELECT ID, ColumnA, ColumnB FROM TableA JOIN TableB
SELECT ID, ColumnC, ColumnD FROM TableA JOIN TableB
Truncate the columns appropriately
SELECT LEFT(LongColumn,2000)...
Remove any redundant columns from the SELECT
SELECT ColumnA, ColumnB, --IDColumnNotUsedInOutput
FROM TableA
Migrate off of SQL Server 2000