IN clause with large list in OpenJpa causing too complex statement - db2

I have to create a named query where I need to group my results by some fields and also using an IN clause to limit my results.
The it looks something like this
SELECT new MyDTO(e.objID) FROM Entity e WHERE e.objId IN (:listOfIDs) GROUP BY e.attr1, e.attr2
I'm using OpenJPA and IBM DB2. In some cases my List of IDs can be very large (>80.000 IDs) and then the generated SQL statement becomes too complex for DB2, because the final generated statement prints out all IDs, like this:
SELECT new MyDTO(e.objID) FROM Entity e WHERE e.objId IN (1,2,3,4,5,6,7,...) GROUP BY e.attr1, e.attr2
Is there any good way to handle this kind of query? A possible Workaround would be to write the IDs in a temporary table and then using the IN clause on this table.

You should put all of the values in a table and rewrite the query as a join. This will not only solve your query problem, it should be more efficient as well.
declare global temporary table ids (
objId int
) with replace on commit preserve rows;
--If this statement is too long, use a couple of insert statements.
insert into session.ids values
(1,2,3,4,....);
select new mydto(e.objID)
from entity e
join session.ids i on
e.objId = i.objId
group by e.attr1, e.attr2;

Related

The sqliite db query is not working in postgresql db

i am having a query which is working correctly in SQLite. but its giving error in PostgreSQL.
SELECT decks.id, decks.name, count(cards.id)
from decks
JOIN cards ON decks.id = cards.did
GROUP BY cards.did
above query is giving error in postgresql.
ERROR: column "decks.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT decks.id, decks.name, count(cards.id) FROM decks JOIN...
You can't have columns in the SELECT list, that are not used in an aggregate function or part of the GROUP BY. The fact that SQLite accepts this, is a bug in SQLite. The fact that Postgres rejects this, is correct.
You need to rewrite your query to:
SELECT decks.id, decks.name, count(cards.id)
from decks
JOIN cards ON decks.id = cards.did
GROUP BY decks.id, decks.name;
If decks.id is the primary key, you can shorten the grouping to GROUP BY decks.id

Postgresql get references from a dictionary

I'm trying to build a request to get the data from a table, but some of those columns have foreign keys I would like to replace by the associated keyword in one request.
Basically there's
table A with column 1:PKA-ID and column 2:name.
table B with column 1:PKB-ID, column 2:FKA-ID, column 3:amount.
I want to get all the lines in table B but with all foreign keys replaced by the associated names in table A.
I started building a request with a subrequest + alias to get that, but ofc I have more than one result per subrequest, yet I can't find a way to link that subrequest to the ID of table B [might be exhausted, dumb or both] from the main request. I did something like that:
SELECT (SELECT "NAME" FROM A JOIN B ON ID = FKA-ID) AS name, amount FROM TABLEB;
it feels so simple of a request yet...
You don't need a join in the subselect.
SELECT pkb_id,
(SELECT name FROM a WHERE a.pka_id = b.fka_id),
amount
FROM b;
(See it live in SQL Fiddle).
The subselect query runs for each and every row of its parent select and has the parent row available from the context.
You can also use a simple join.
SELECT b.pkb_id, a.name, b.amount
FROM b, a
WHERE a.pka_id = b.fka_id;
Note that the join version puts less restrictions on the PostgreSQL query optimizer so in some cases the join version might work faster. (For example, in PostgreSQL 9.6 the join might utilize multiple CPU units, cf. Parallel Query).

Update from existing table in Redshift

I would like to update a value in Redshift table from results of other table, I'm trying to run to following query but received an error.
update section_translate
set word=t.section_type
from (
select distinct section_type from mr_usage where section_type like '%sディスコ')t
where word = '80sディスコ'
The error I received:
ERROR: Target table must be part of an equijoin predicate
Can't understand what is incorrect in my query.
You need to make the uncorrelated subquery to a correlated subquery,
update section_translate
set word=t.section_type
from (
select distinct section_type,'80sディスコ' as word from mr_usage where section_type like '%sディスコ')t
where section_translate.word = t.word
Otherwise, each record of the outer query is eligible for updates and the query engine rejects it. The way Postgre (and thus Redshift) evaluates uncorrelated subqueries is slightly different from SQL Server/ Oracle etc.

Hive: How to do a SELECT query to output a unique primary key using HiveQL?

I have the following schema dataset which i want to transform into a table that can be exported to SQL. I am using HIVE. Input as follows
call_id,stat1,stat2,stat3
1,a,b,c,
2,x,y,z,
3,d,e,f,
1,j,k,l,
The output table needs to have call_id as its primary key so it needs to be unique. The output schema should be
call_id,stat2,stat3,
1,b,c, or (1,k,l)
2,y,z,
3,e,f,
The problem is that when i use the keyword DISTINCT in the HIVE query, the DISTINCT applies to the all the colums combined. I want to apply the DISTINCT operation only to the call_id. Something on the lines of
SELECT DISTINCT(call_id), stat2,stat3 from intable;
However this is not valid in HIVE(I am not well-versed in SQL either).
The only legal query seems to be
SELECT DISTINCT call_id, stat2,stat3 from intable;
But this returns multiple rows with same call_id as the other columns are different and the row on the whole is distinct.
NOTE: There is no arithmetic relation between a,b,c,x,y,z, etc. So any trick of averaging or summing is not viable.
Any ideas how i can do this?
One quick idea,not the best one, but will do the work-
hive>create table temp1(a int,b string);
hive>insert overwrite table temp1
select call_id,max(concat(stat1,'|',stat2,'|',stat3)) from intable group by call_id;
hive>insert overwrite table intable
select a,split(b,'|')[0],split(b,'|')[1],split(b,'|')[2] from temp1;
,,I want to apply the DISTINCT operation only to the call_id"
But how will then Hive know which row to eliminate?
Without knowing the amount of data / size of the stat fields you have, the following query can the job:
select distinct i1.call_id, i1.stat2, i1.stat3 from (
select call_id, MIN(concat(stat1, stat2, stat3)) as smin
from intable group by call_id
) i2 join intable i1 on i1.call_id = i2.call_id
AND concat(i1.stat1, i1.stat2, i1.stat3) = i2.smin;

JPA 2.0: Batch query, safe and performant?

I am looking for a JPA-solution (vendor-independent) to execute a query in batches. The challenge is to make this performant as well as thread-safe.
Example query:
Query query = em.createQuery("select e from Entity e where e.property in :list");
The list is a collection of size between 1 and 385000. Hence, the requirement to batch this query.
Initial naive approach was to get a sublist from the original list and loop through until done. This was safe and working well except that it was not performant.
Second approach was to load everything from the list onto a temp table (permanent in existence, but used as a temporary table) and then use the original query and join with the temp table. This is definitely performant, but is not thread-safe as I need to clear the temp table after each batch and without having any thread id or something of that sort in the temp table its pretty unsafe (which is at the moment).
I would really appreciate suggestions to arrive at a performant and safe way to tackle this issue.
Thanks
First of all, the query is not valid JPQL, because it doesn't have a select clause.
Second, it should be where e.property in (:list).
Your strategy of populating a temp table looks fine to me. You could just make it contain an additional uuid column, and generate a new UUID each time you want to perform such a query:
generate a UUID
insert all the elements of the list in the table, with the uuid column set to the generated UUID
execute a query such as select e from Entity e, TempEntity temp where e.property = temp.property and temp.uuid = :uuid
execute a query to delete all the rows from the temp table (not absolutely necessary): delete from TempEntity temp where temp.uuid :uuid