working with JPA repository queries

working with JPA repository queries - jpa

If this works:
repo.findFirstByConversionIncomeAmountLessThanEqualOrderByConversionIncomeAmountDesc(...);
and this works:
repo.findFirstByConversionEffectiveDateLessThanEqualOrderByConversionIncomeAmountDesc(...);
why does this generate null?
repo.findFirstByConversionIncomeAmountLessThanEqualAndConversionEffectiveDateLessThanEqualOrderByConversionIncomeAmountDesc(...);
if I understand this correctly, it should be the same as
select top 1 *
from conversions
where income_amount <= ?1
and effective_date <= ?2
order by income_amount desc
There is a record that would satisfy the query.
-- edit --
Oh, yes, the "conversion" in the method name is necessary as I'm actually asking for an embedded object's values.

Related

SQL and optional sorting ASC / DESC

I know how to add an optional parameters to an sql query
it is done like this
select * from test t where (?1 is null or t.myColumn = ?1)
where ?1 is a parameter which you can pass from language
such as java (i.e using named queries), the advantage is that I don't
need to use ORM nor I need manually build (concatenate) strings into
query.
is there anything similar to make ASC and DESC optional?

No, you will have to use dynamic SQL (statements constructed on the fly) for that.
There is a trick you could use for numeric columns:
... ORDER BY col * CASE WHEN $1 = 'DESC' THEN -1 ELSE 1 END

Will Postgres' DISTINCT function always return null as the first element?

I'm selecting distinct values from tables thru Java's JDBC connector and it seems that NULL value (if there's any) is always the first row in the ResultSet.
I need to remove this NULL from the List where I load this ResultSet. The logic looks only at the first element and if it's null then ignores it.
I'm not using any ORDER BY in the query, can I still trust that logic? I can't find any reference in Postgres' documentation about this.

You can add a check for NOT NULL. Simply like
select distinct columnName
from Tablename
where columnName IS NOT NULL
Also if you are not providing the ORDER BY clause then then order in which you are going to get the result is not guaranteed, hence you can not rely on it. So it is better and recommended to provide the ORDER BY clause if you want your result output in a particular output(i.e., ascending or descending)
If you are looking for a reference Postgresql document then it says:
If ORDER BY is not given, the rows are returned in whatever order the
system finds fastest to produce.

If it is not stated in the manual, I wouldn't trust it. However, just for fun and try to figure out what logic is being used, running the following query does bring the NULL (for no apparent reason) to the top, while all other values are in an apparent random order:
with t(n) as (values (1),(2),(1),(3),(null),(8),(0))
select distinct * from t
However, cross joining the table with a modified version of itself brings two NULLs to the top, but random NULLs dispersed througout the resultset. So it doesn't seem to have a clear-cut logic clumping all NULL values at the top.
with t(n) as (values (1),(2),(1),(3),(null),(8),(0))
select distinct * from t
cross join (select n+3 from t) t2

postgreSQL classification limit

I have a requirement that needs to query top 5 news for each type and return to frontend, implemented by JPA.
I've two solutions now,
One is to manually append union SQL by annotation,
Call a service by different parameter type in loop.
in fact what I want is just like SQL as below
select id, title, content
from portal p
where p.type = 'NEWS'
order by create_date
limit 5
union
select id,title,content,
from portal p
where p.type = 'MAG'
order by create_date
limit 5
union...
Solution A need to code many SQL statements in JAVA, while solution B is not efficient as types is more than 10.
Is there any other way to query the data? by annotation or postgreSQL function? I'm new to both JPA & Postgres.
Thanks in advance.

You can do this with a single SQL statement. I'm not sure whether you'll be able to avoid a table scan. You might need to include some more columns, depending most likely on whether you need to sort by them.
select *
from (select
id, title, content,
row_number() over (partition by type order by create_date asc) row_num
from portal
) as numbered_rows
where row_num <= 5;
One advantage of this kind of SQL statement is that it requires no maintenance. It will continue to work correctly no matter how many different types you add.
Think carefully whether you need the first five (order by create_date ASC) or the latest five (order by create_date DESC).

How to specify two expressions in the select list when the subquery is not introduced with EXISTS

I have a query that uses a subquery and I am having a problem returning the expected results. The error I receive is..."Only one expression can be specified in the select list when the subquery is not introduced with EXISTS." How can I rewrite this to work?
SELECT
a.Part,
b.Location,
b.LeadTime
FROM
dbo.Parts a
LEFT OUTER JOIN dbo.Vendor b ON b.Part = a.Part
WHERE
b.Location IN ('A','B','C')
AND
Date IN (SELECT Location, MAX(Date) FROM dbo.Vendor GROUP BY Location)
GROUP BY
a.Part,
b.Location,
b.LeadTime
ORDER BY
a.Part

I think something like this may be what you're looking for. You didn't say what version of SQL Server--this works in SQL 2005 and up:
SELECT
p.Part,
p.Location, -- from *p*, otherwise if no match we'll get a NULL
v.LeadTime
FROM
dbo.Parts p
OUTER APPLY (
SELECT TOP (1) * -- * here is okay because we specify columns outside
FROM dbo.Vendor v
WHERE p.Location = v.Location -- the correlation part
ORDER BY v.Date DESC
) v
WHERE
p.Location IN ('A','B','C')
ORDER BY
p.Part
;
Now, your query can be repaired as is by adding the "correlation" part to change your query into a correlated subquery as demonstrated in Kory's answer (you'd also remove the GROUP BY clause). However, that method still requires an additional and unnecessary join, hurting performance, plus it can only pull one column at a time. This method allows you to pull all the columns from the other table, and has no extra join.
Note: this gives logically the same results as Lamak's answer, however I prefer it for a few reasons:
When there is an index on the correlation columns (Location, here) this can be satisfied with seeks, but the Row_Number solution has to scan (I believe).
I prefer the way this expresses the intent of the query more directly and succinctly. In the Row_Number method, one must get out to the outer condition to see that we are only grabbing the rn = 1 values, then bop back into the CTE to see what that is.
Using CROSS APPLY or OUTER APPLY, all the other tables not involved in the single-inner-row-per-outer-row selection are outside where (to me) they belong. We aren't squishing concerns together. Using Row_Number feels a bit like throwing a DISTINCT on a query to fix duplication rather than dealing with the underlying issue. I guess this is basically the same issue as the previous point worded in a different way.
The moment you have TWO tables from which you wish to pull the most recent value, the Row_Number() solution blows up completely. With this syntax, you just easily add another APPLY clause, and it's crystal clear what you're doing. There is a way to use Row_Number for the multiple tables scenario by moving the other tables outside, but I still don't prefer that syntax.
Using this syntax allows you to perform additional joins based on whether the selected row exists or not (in the case that no matching row was found). In the Row_Number solution, you can only reasonably do that NOT NULL checking in the outer query--so you are forced to split up the query into multiple, separated parts (you don't want to be joining to values you will be discarding!).
P.S. I strongly encourage you to use aliases that hint at the table they represent. Please don't use a and b. I used p for Parts and v for Vendor--this helps you and others make sense of the query more quickly in the future.

If I understood you corrrectly, you want the rows with the max date for locations A, B and C. Now, assuming SQL Server 2005+, you can do this:
;WITH CTE AS
(
SELECT
a.Part,
b.Location,
b.LeadTime,
RN = ROW_NUMBER() OVER(PARTITION BY a.Part ORDER BY [Date] DESC)
FROM
dbo.Parts a
LEFT OUTER JOIN dbo.Vendor b ON b.Part = a.Part
WHERE
b.Location IN ('A','B','C')
)
SELECT Part,
Location,
LeadTime
FROM CTE
WHERE RN = 1
ORDER BY Part

In your subquery you need to correlate the Location and Part to the outer query.
Example:
Date = (SELECT MAX(Date)
FROM dbo.Vender v
WHERE v.Location = b.Location
AND v.Part = b.Part
)
So this will bring back one date for each location and part

Why does this Oracle 10g SQL run slow only when I query a subquery with a where clause?

I can't paste in the entire SQL for various reasons, so consider this example:
select *
from
(select nvl(get_quantity(1), 10) available_qty
from dual)
where available_qty > 30;
get_quantity is a function that makes a calculation based on the ID of a record that's passed through it. If it returns null, I use nvl() to force it to 10.
The query runs very slow when I use the WHERE clause in the parent query. When I comment out the WHERE clause, however, it runs very fast. What I don't get is why it can display the data very fast, but it can't query it just as fast. I am querying the results of a subquery, too. I was under the impression that subqueries return a "rendered" dataset. It's almost as if querying the available_qty identifier is causing it to reference something within the subquery.
This is why I don't think the contents of the get_quantity function are relevant here, so I didn't bother posting it. Instead, I think it's a misunderstanding on my part of how Oracle handles subqueries and whatnot.
Do any of you Oracle gurus have any idea what I am doing wrong?
Afterthought: as I was entering tags for this question, the tag "correlated subquery" came up. In doing some quick research, it seems that this type of subquery somewhat depends on the outer query. Could this be related to my problem?

Let's try an experiment. First we'll run the following query:
select lvl, rnd
from (select level as lvl from dual connect by level <= 5) a,
(select dbms_random.value() rnd from dual) b;
The "a" subquery will return 5 rows with values from 1 to 5. The "b" subquery will return one row with a random value. If the function is run before the two tables are join (by Cartesian), the same random value will be returned for each row. The actual results:
LVL RND
---------- ----------
1 .417932089
2 .963531718
3 .617016889
4 .128395638
5 .069405568
5 rows selected.
Clearly the function was run for each of the joined rows, not for the subquery before the join. This is a result of Oracle's optimizer deciding that the best path for the query is to do things in that order. To prevent this, we have to add something to the second subquery that will make Oracle run the subquery in it's entirety before performing the join. We'll add rownum to the subquery, since Oracle knows rownum will change if it's run after the join. The following query demonstrates this:
select lvl, rnd from (
select level as lvl from dual connect by level <= 5) a,
(select dbms_random.value() rnd, rownum from dual) b;
As you can see from the results, the function was only run once in this case:
LVL RND
---------- ----------
1 .028513902
2 .028513902
3 .028513902
4 .028513902
5 .028513902
5 rows selected.
In your case, it seems likely that the filter provided by the where clause is making the optimizer take a different path, where it's running the function repeatedly, rather than once. By making Oracle run the subquery as written, you should get more consistent run-times.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse