What is the proper way to translate a complicated SQL query with custom columns and rdbms-specific functions to Doctrine - postgresql

I have been a Propel user for years and only recently started switching to Doctrine. It's still quite new to me and sometimes Propel habits kick in and make it hard form me to "think in Doctrine". Below is a specific case. You don't have to know Propel to answer my question - I also present my case in raw SQL.
Simplified structure of the tables that my query refers is like this:
Application table has FK to Admin which has FK to User (fos_user in the DB)
ApplicationUser table has FK to Application
My query gets all Application records with custom columns containing additional info retrieved from related User records (through Admin) and some COUNTs of related ApplicationUser objects, one of which is additionally filtered (adminname, usercount, usercountperiod columns added to the query).
I have a Propel query like this:
ApplicationQuery::create()
->leftJoinApplicationUser()
->useAdminQuery()
->leftJoinUser()
->endUse()
->withColumn('fos_user.username', 'adminname')
->withColumn('COUNT(application_user.id)', 'usercount')
->withColumn('COUNT(application_user.id) FILTER '
. '(WHERE score > 0 AND '
. ' application_user.created_at >= to_timestamp('.strtotime($users_scored['begin']).') and '
. ' application_user.created_at < to_timestamp('.strtotime($users_scored['end']).') )', 'usercountperiod')
->groupById()
->groupBy('User.Id')
->orderById('DESC')
->paginate( ....
This is how it translates to SQL (PostgreSQL):
SELECT application.id, application.name, ...,
fos_user.username AS "adminname",
COUNT(socialscore_application_user.id) AS "usercount",
COUNT(application_user.id) FILTER (
WHERE score > 0 AND
application_user.created_at >= to_timestamp(1491004800) and
application_user.created_at < to_timestamp(1498780800) ) AS "usercountperiod"
FROM application
LEFT JOIN application_user ON (application.id=application_user.application_id)
LEFT JOIN admin ON (application.admin_id=admin.id)
LEFT JOIN fos_user ON (admin.id=fos_user.id)
GROUP BY application.id,fos_user.id
ORDER BY application.id DESC
LIMIT 15
As you can see it's quite complex (in terms of translating it to Doctrine ORM, when you're a Doctrine newbie like me :) ). It uses specific features of PostgreSQL:
being able to include only Primary Key in GROUP BY statement, while other columns from the same table can be used in SELECT without aggregating function or inclusion in GROUP BY (because they are "dependent" on the PK);
FILTER which allows you to further filter records that are fed into aggregate functions
It also uses some joins and adds custom columns (adminname, usercount, usercountperiod) which I can access in my resulting Propel Model objects (with functions like $result->getAdminname().
My question is: what is the "Doctrine way" to achieve as similar thing as possible as simply as possible (use some PostgreSQL-specific or any RDBMS-specific features, add some custom columns which will be accessible through ORM objects and so on)?
Thank you for help.

Related

Laravel 4.2 order by another collections field or result of a function

I have a mongo database and I'm trying to write an Eloquent code to change some fields before using them in WHERE or ORDER BY clauses. something like this SQL query:
Select ag.*, ht.*
from agency as ag inner join hotel as ht on ag.hotel_id = ht.id
Where ht.title = 'OrangeHotel'
-- or --
Select ag.*, ht.*
from agency as ag inner join hotel as ht on ag.hotel_id = ht.id
Order by ht.title
sometimes there is no other table and I just need to use calculated field in Where or Order By clause:
Select *
from agency
Where func(agency_admin) = 'testAdmin'
Select *
from agency
Order by func(agency_admin)
where func() is my custom function.
any suggestion?
and I have read Laravel 4/5, order by a foreign column for half of my problem, but I don't know how can I use it.
For the first query: mongodb only support "join" partially with the aggregation pipeline, which limits your aggregation in one collection. For "join"s between different collections/tables, just select from collections one by one, first the one containing the "where" field, then the one who should "join" with the former, and so on.
The second question just puzzled me for some minutes until I see this question and realized it's the same as your first question: sort the collection containing your sort field and retrive some data, then go to another.
For the 3rd question, this question should serve you well.

Faster/efficient alternative to IN clause in custom/native queries in spring data jpa

I have a custom query along these lines. I get the list of orderIds from outside. I have the entire order object list with me, so I can change the query in any way, if needed.
#Query("SELECT p FROM Person p INNER JOIN p.orders o WHERE o.orderId in :orderIds)")
public List<Person> findByOrderIds(#Param("orderIds") List<String> orderIds);
This query works fine, but sometimes it may have anywhere between 50-1000 entries in the orderIds list sent from outside function. So it becomes very slow, taking as much as 5-6 seconds which is not fast enough. My question is, is there a better, faster way to do this? When I googled, and on this site, I see we can use ANY, EXISTS: Postgresql: alternative to WHERE IN respective WHERE NOT IN or create a temporary table: https://dba.stackexchange.com/questions/12607/ways-to-speed-up-in-queries-under-postgresql or join this to VALUES clause: Alternative when IN clause is inputed A LOT of values (postgreSQL). All these answers are tailored towards direct SQL calls, nothing based on JPA. ANY keyword is not supported by spring-data. Not sure about creating temporary tables in custom queries. I think I can do it with native queries, but have not tried it. I am using spring-data + OpenJPA + PostgresSQL.
Can you please suggest a solution or give pointers? I apologize if I missed anything.
thanks,
Alice
You can use WHERE EXISTS instead of IN Clause in a native SQL Query as well as in HQL in JPA which results in a lot of performance benefits. Please see sample below
Sample JPA Query:
SELECT emp FROM Employee emp JOIN emp.projects p where NOT EXISTS (SELECT project from Project project where p = project AND project.status <> 'Active')

How to use "DISTINCT ON (field)" in Doctrine 2?

I know how to use "DISTINCT" in Doctrine 2, but I really need to use "DISTINCT ON (field)" and I don't know how to do this with the QueryBuilder.
My SQL query looks like:
SELECT DISTINCT ON (currency) currency, amount FROM payments ORDER BY currency
And this query works perfect, but I can't use it with the QueryBuilder. Maybe I could write this query on some other way?
I would suggest that the SELECT DISTINCT ON (..) construct that PostgreSQL supports is outside the Object Relational Model (ORM) that is central to Doctrine. Or, perhaps put another way, because SELECT DISTINCT ON (..) is rare in SQL implementations Doctrine haven't coded for it.
Regardless of the actual logic for it not working, I would suggest you try Doctrine's "Native SQL". You need to map the results of your query to the ORM.
With NativeQuery you can execute native SELECT SQL statements and map
the results to Doctrine entities or any other result format supported
by Doctrine.
In order to make this mapping possible, you need to describe to
Doctrine what columns in the result map to which entity property. This
description is represented by a ResultSetMapping object.
With this feature you can map arbitrary SQL code to objects, such as
highly vendor-optimized SQL or stored-procedures.
SELECT DISTINCT ON (..) falls into vendor-optimized SQL I think, so using NativeQuery should allow you to access it.
Doctrine QueryBuilder has some limitations. Even if I didn't check if it's was possible with query builder, I do not hesitate to use DQL when I do not know how to write the query with query builder.
Check theses examples at
http://doctrine-orm.readthedocs.org/en/latest/reference/dql-doctrine-query-language.html#dql-select-examples
Hope this help.
INDEX BY can be used in DQL, allowing first result rows indexed by the defined string/int field to be overwritten by following ones with the same index:
SELECT
p.currency,
p.amount
FROM Namespace\To\Payments p INDEX BY p.currency
ORDER BY p.currency ASC
DQL - EBNF - INDEX BY

Struggling with Lambda expression (VB .net)

I have a relatively simple thing that I can do easily in SQL, but I'm trying to get used to using Lambda expressions, and having a hard time.
Here is a simple example. Basically I have 2 tables.
tblAction (ActionID, ActionName)
tblAudit (AuditID, ActionID, Deleted)
tblAudit may have an entry regarding tblAction with the Deleted flag set to 1.
All I want to do is select actions where we don't have a Deleted entry in tblAudit. So the SQL statement is:
Select tblAction.*
From tblAction LEFT JOIN tblAudit on tblAction.ActionID=tblAudit.ActionID
where tblAudit.Deleted <> 1
What would be the equivalent of the above in VB.Net's LINQ? I tried:
Context.Actions.Where(Function(act) Context.Audit
.Where(Function(aud) aud.Deleted=False AndAlso aud.ActionID=act.ActionID)).ToList
But that is really an inner join type scenario where it requires that each entry in tblAction also has an Entry in tblAudit. I am using Entity Framework Code First to do the database mapping. Is there a way to define the mapping in a way where you can do this?
You should add
Public Property Audits As DbSet<Audit>
into your action entity class (to register the association between those tables).
Now you can just write what you mean:
(From act in Context.Actions Where Not act.Audits.Any(Function(audit) audit.Deleted)).ToArray
which is equivalent to
Context.Actions.Where(Function(act) Not act.Audits.Any(Function(audit) audit.Deleted)).ToArray
and let the LINQ parser do the hard SQL work.

what's the utility of array type?

I'm totally newbie with postgresql but I have a good experience with mysql. I was reading the documentation and I've discovered that postgresql has an array type. I'm quite confused since I can't understand in which context this type can be useful within a rdbms. Why would I have to choose this type instead of using a classical one to many relationship?
Thanks in advance.
I've used them to make working with trees (such as comment threads) easier. You can store the path from the tree's root to a single node in an array, each number in the array is the branch number for that node. Then, you can do things like this:
SELECT id, content
FROM nodes
WHERE tree = X
ORDER BY path -- The array is here.
PostgreSQL will compare arrays element by element in the natural fashion so ORDER BY path will dump the tree in a sensible linear display order; then, you check the length of path to figure out a node's depth and that gives you the indentation to get the rendering right.
The above approach gets you from the database to the rendered page with one pass through the data.
PostgreSQL also has geometric types, simple key/value types, and supports the construction of various other composite types.
Usually it is better to use traditional association tables but there's nothing wrong with having more tools in your toolbox.
One SO user is using it for what appears to be machine-aided translation. The comments to a follow-up question might be helpful in understanding his approach.
I've been using them successfully to aggregate recursive tree references using triggers.
For instance, suppose you've a tree of categories, and you want to find products in any of categories (1,2,3) or any of their subcategories.
One way to do it is to use an ugly with recursive statement. Doing so will output a plan stuffed with merge/hash joins on entire tables and an occasional materialize.
with recursive categories as (
select id
from categories
where id in (1,2,3)
union all
...
)
select products.*
from products
join product2category on...
join categories on ...
group by products.id, ...
order by ... limit 10;
Another is to pre-aggregate the needed data:
categories (
id int,
parents int[] -- (array_agg(parent_id) from parents) || id
)
products (
id int,
categories int[] -- array_agg(category_id) from product2category
)
index on categories using gin (parents)
index on products using gin (categories)
select products.*
from products
where categories && array(
select id from categories where parents && array[1,2,3]
)
order by ... limit 10;
One issue with the above approach is that row estimates for the && operator are junk. (The selectivity is a stub function that has yet to be written, and results in something like 1/200 rows irrespective of the values in your aggregates.) Put another way, you may very well end up with an index scan where a seq scan would be correct.
To work around it, I increased the statistics on the gin-indexed column and I periodically look into pg_stats to extract more appropriate stats. When a cursory look at those stats reveal that using && for the specified values will return an incorrect plan, I rewrite applicable occurrences of && with arrayoverlap() (the latter has a stub selectivity of 1/3), e.g.:
select products.*
from products
where arrayoverlap(cat_id, array(
select id from categories where arrayoverlap(parents, array[1,2,3])
))
order by ... limit 10;
(The same goes for the <# operator...)