Self m:n Relation - postgresql

I have persons and a person can contact multiple other persons, so basically the "default" tables would be:
persons (id)
contacts (person1_id, person2_id)
With this schema, I'd have to issue queries like
FROM contacts c
WHERE ( person1_id = *id of person1* AND person2_id = *id of person2* )
( person1_id = *id of person2* AND person2_id = *id of person1* )
to get the relation between two persons when I insert such a relation only once.
What is the common practice to deal with this situation?
Insert data once and do such an OR query
Insert the relation twice so that person1_id = id of person1 AND person2_id = id of person2 is enough
An entirely different approach?
The m:n table actually contains additional data, so if I create a relation for both ways, I'd have to duplicate the data
This is a core part of the application and most non-trivial queries involve at least a sub query that determines whether or not such a relation exists

If you write your insert logic such that person1_id < person2_id is true for all rows, then you can just write
FROM contacts c
WHERE person1_id = min(*id_of_person_1*, *id_of_person_2*)
AND person2_id = max(*id_of_person_1*, *id_of_person_2*)

Why don't you use Join between the tables?
something like this:
FROM contact c INNER JOIN person p ON = c.person1_id
The the where and group bys you need to complete you're query =)
Take a look here how the results will be showed:

Try this one mate =)
SELECT c.person1_id as id_person_1, c.person2_id as id_person_2, as name_person_1, as name_person_2
FROM contact c
LEFT JOIN person p1 ON = c.person1_id
RIGHT JOIN person p2 ON = c.person2_id;
I don't know if it will work.. but give it try mate =)

"Insert the relation twice so that person1_id = id of person1 AND person2_id = id of person2 is enough"
That is how I'd do it, personally. It allows to deal with the situation where A has the contact details of B but not the other way around (e.g. a girl gives a guy her number at the bar saying "call me" as she walks out). It also makes the queries simpler.


Linq GroupJoin (join...into) results in INNER JOIN?

I am referencing the accepted answer to this question:
LINQ to SQL multiple tables left outer join
In my example, I need all of the Person records regardless if there is a matching Staff record.
I am using the following query (simplified for illustation's sake):
var result = from person in context.Person
join staffQ in context.Staff
on person.StaffID equals staffQ.ID into staffStaffIDGroup
from staff in staffStaffIDGroup.DefaultIfEmpty()
select new PersonModel()
ID = person.ID,
Fname = person.Fname,
Lname = person.Lname,
Sex = person.Sex,
Username = staff != null ? staff.Username : ""
However, contrary to my expectations, the query results in the following SQL with an INNER JOIN, which eliminates records I need in the the result set.
[Extent1].[ID] AS [ID],
[Extent1].[fname] AS [fname],
[Extent1].[lname] AS [lname],
[Extent1].[sex] AS [sex],
[Extent2].[username] AS [username]
FROM [dbo].[Person] AS [Extent1]
INNER JOIN [dbo].[Staff] AS [Extent2] ON [Extent1].[StaffID] = [Extent2].[ID]
I thought that GroupJoin (or join...into) is supposed to get around this? I know I must have made a dumb mistake here, but I can't see it.
In general the query should generate left outer join.
But remember, this is EF, and it has additional information coming from the model. In this case looks like the StaffID property of Person is an enforced FK constraint to Stuff, so EF knows that there is always a corresponding record in Staff table, hence ignoring your left outer join construct and generates inner join instead.
Again, the model (properties, whether they are required or not, the relationships - required or not etc.) allows EF to perform similar smart decisons and optimizations.
Use a Navigation Property instead of a Join. If you're using a Join in EF LINQ you're almost always doing the wrong thing.
Something like
var result = from person in context.Person
select new PersonModel()
ID = person.ID,
Fname = person.Fname,
Lname = person.Lname,
Sex = person.Sex,
Username = person.StaffId != null ? Person.Staff.Username : ""

Select most reviewed courses starting from courses having at least 2 reviews

I'm using Flask-SQLAlchemy with PostgreSQL. I have the following two models:
class Course(db.Model):
id = db.Column(db.Integer, primary_key = True )
course_name =db.Column(db.String(120))
course_description = db.Column(db.Text)
course_reviews = db.relationship('Review', backref ='course', lazy ='dynamic')
class Review(db.Model):
__table_args__ = ( db.UniqueConstraint('course_id', 'user_id'), { } )
id = db.Column(db.Integer, primary_key = True )
review_date = db.Column(db.DateTime)
review_comment = db.Column(db.Text)
rating = db.Column(db.SmallInteger)
course_id = db.Column(db.Integer, db.ForeignKey('') )
user_id = db.Column(db.Integer, db.ForeignKey('') )
I want to select the courses that are most reviewed starting with at least two reviews. The following SQLAlchemy query worked fine with SQlite:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \ .order_by(func.count(models.Review.course_id).desc()).all()
But when I switched to PostgreSQL in production it gives me the following error:
ProgrammingError: (ProgrammingError) column "" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT AS review_id, review.review_date AS review_...
'SELECT AS review_id, review.review_date AS review_review_date, review.review_comment AS review_review_comment, review.rating AS review_rating, review.course_id AS review_course_id, review.user_id AS review_user_id, count(review.course_id) AS count_1 \nFROM review GROUP BY review.course_id \nHAVING count(review.course_id) > %(count_2)s ORDER BY count(review.course_id) DESC' {'count_2': 1}
I tried to fix the query by adding models.Review in the GROUP BY clause but it did not work:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \.order_by(func.count(models.Review.course_id).desc()).all()
Can anyone please help me with this issue. Thanks a lot
SQLite and MySQL both have the behavior that they allow a query that has aggregates (like count()) without applying GROUP BY to all other columns - which in terms of standard SQL is invalid, because if more than one row is present in that aggregated group, it has to pick the first one it sees for return, which is essentially random.
So your query for Review basically returns to you the first "Review" row for each distinct course id - like for course id 3, if you had seven "Review" rows, it's just choosing an essentially random "Review" row within the group of "course_id=3". I gather the answer you really want, "Course", is available here because you can take that semi-randomly selected Review object and just call ".course" on it, giving you the correct Course, but this is a backwards way to go.
But once you get on a proper database like Postgresql you need to use correct SQL. The data you need from the "review" table is just the course_id and the count, nothing else, so query just for that (first assume we don't actually need to display the counts, that's in a minute):
most_rated_course_ids = session.query(
having(func.count(Review.course_id) > 1).\
but that's not your Course object - you want to take that list of ids and apply it to the course table. We first need to keep our list of course ids as a SQL construct, instead of loading the data - that is, turn it into a derived table by converting the query into a subquery (change the word .all() to .subquery()):
most_rated_course_id_subquery = session.query(
having(func.count(Review.course_id) > 1).\
one simple way to link that to Course is to use an IN:
courses = session.query(Course).filter(
but that's essentially going to throw away the "ORDER BY" you're looking for and also doesn't give us any nice way of actually reporting on those counts along with the course results. We need to have that count along with our Course so that we can report it and also order by it. For this we use a JOIN from the "course" table to our derived table. SQLAlchemy is smart enough to know to join on the "course_id" foreign key if we just call join():
courses = session.query(Course).join(most_rated_course_id_subquery).all()
then to get at the count, we need to add that to the columns returned by our subquery along with a label so we can refer to it:
most_rated_course_id_subquery = session.query(
having(func.count(Review.course_id) > 1).\
courses = session.query(
Course, most_rated_course_id_subquery.c.count
A great article I like to point out to people about GROUP BY and this kind of query is SQL GROUP BY techniques which points out the common need for the "select from A join to (subquery of B with aggregate/GROUP BY)" pattern.

Query produced for IN filter on 1-1 relation joins to parent table twice

I have this problem and reproduced it with AdventureWorks2008R2 to make it more easy. Basically, I want to filter a parent table for a list of IN values and I thought it would generate this type of query
but it doesn't.
SELECT * FROM SalesOrderDetail where EXISTS( select * from SalesOrderHeader where and rowguid IN ('asdf', 'fff', 'weee' )
Any ideas how to change the LINQ statement to query Header only once?
(ignore the fact I'm matching on Guids - it will actually be integers; I was just quickly looking for a 1-1 table in EF because that's when the problem occurs and I happened to find these)
var guidsToFind = new Guid[] { Guid.NewGuid(), Guid.NewGuid(), Guid.NewGuid()};
AdventureWorks2008R2Entities context = new AdventureWorks2008R2Entities();
var g = context.People.Where(p => guidsToFind.Contains(p.BusinessEntity.rowguid)).ToList();
That produces the following more expensive query:
SELECT [Extent1].[BusinessEntityID] AS [BusinessEntityID],
[Extent1].[PersonType] AS [PersonType],
[Extent1].[NameStyle] AS [NameStyle],
[Extent1].[Title] AS [Title],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[MiddleName] AS [MiddleName],
[Extent1].[LastName] AS [LastName],
[Extent1].[Suffix] AS [Suffix],
[Extent1].[EmailPromotion] AS [EmailPromotion],
[Extent1].[AdditionalContactInfo] AS [AdditionalContactInfo],
[Extent1].[Demographics] AS [Demographics],
[Extent1].[rowguid] AS [rowguid],
[Extent1].[ModifiedDate] AS [ModifiedDate]
FROM [Person].[Person] AS [Extent1]
INNER JOIN [Person].[BusinessEntity] AS [Extent2] ON [Extent1].[BusinessEntityID] = [Extent2].[BusinessEntityID]
LEFT OUTER JOIN [Person].[BusinessEntity] AS [Extent3] ON [Extent1].[BusinessEntityID] = [Extent3].[BusinessEntityID]
WHERE [Extent2].[rowguid] = cast('b95b63f9-6304-4626-8e70-0bd2b73b6b0f' as uniqueidentifier) OR [Extent3].[rowguid] IN (cast('f917a037-b86b-4911-95f4-4afc17433086' as uniqueidentifier),cast('3188557d-5df9-40b3-90ae-f83deee2be05' as uniqueidentifier))
Really odd. Looks like a LINQ limitation.
I don't have a system to try this on right now but if you first get a list of BusinessEntityId values based on the provided guids and then get the persons like this
var g = context.People.Where(p => businessEntityIdList.Contains(p.BusinessEntityId)).ToList();
there should not be a reason for additional unnecessary joins anymore.
If that works, you can try to combine the to steps into one LINQ expression to see if the separation stays intact.

JPA Query over a join table

I have 3 tables like:
------------- ------------ ---------------
a1 a1,b1 b1
AB is a transition table between A and B
With this, my classes have no composition within these two classes to each other. But I want to know that , with a JPQL Query, if any records exist for my element from A table in AB table. Just number or a boolean value is what I need.
Because AB is a transition table, there is no model object for it and I want to know if I can do this with a #Query in my Repository object.
the AB table must be modeled in an entity to be queried in JPQL. So you must model this as
an own entity class or an association in your A and or your B entity.
I suggest to use Native query method intead of JPQL (JPA supports Native query too). Let us assume table A is Customer and table B is a Product and AB is a Sale. Here is the query for getting list of products which are ordered by a customer.
entityManager.createNativeQuery("SELECT PRODUCT_ID FROM
Actually, the answer to this situation is simpler than you might think. It's a simple matter of using the right tool for the right job. JPA was not designed for implementing complicated SQL queries, that's what SQL is for! So you need a way to get JPA to access a production-level SQL query;
So in your case what you want to do is access the AB table looking only for the id field. Once you have retrieved your query, take your id field and look up the Java object using the id field. It's a second search true, but trivial by SQL standards.
Let's assume you are looking for an A object based on the number of times a B object references it. Say you are wanting a semi-complicated (but typical) SQL query to group type A objects based on the number of B objects and in descending order. This would be a typical popularity query that you might want to implement as per project requirements.
Your native SQL query would be as such:
select a_id as id from AB group by a_id order by count(*) desc;
Now what you want to do is tell JPA to expect the id list to comeback in a form that that JPA can accept. You need to put together an extra JPA entity. One that will never be used in the normal fashion of JPA. But JPA needs a way to get the queried objects back to you. You would put together an entity for this search query as such;
public class IdSearch {
Long id;
public Long getId() {
return id;
public void setId(Long id) { = id;
Now you implement a little bit of code to bring the two technologies together;
public List<IdSearch> findMostPopularA() {
return em.createNativeQuery("select a_id as id from AB group by a_id
order by count(*) desc", IdSearch.class).getResultList();
There, that's all you have to do to get JPA to get your query completed successfully. To get at your A objects you would simply cross reference into your the A list using the traditional JPA approach, as such;
List<IdSearch> list = producer.getMostPopularA();
Iterator<IdSearch> it = list.iterator();
while ( it.hasNext() ) {
IdSearch a =;
A object = em.find(A.class,a.getId());
// your in business!
Still, a little more refinement of the above can simplify things a bit further actually given the many many capabilities of the SQL design structure. A slightly more complicated SQL query will an even more direct JPA interface to your actual data;
public List<A> findMostPopularA() {
return em.createNativeQuery("select * from A, AB
where = AB.a_id
group by a_id
order by count(*) desc", A.class).getResultList();
This removes the need for an interm IdSearch table!
List<A> list = producer.getMostPopularA();
Iterator<A> it = list.iterator();
while ( it.hasNext() ) {
A a =;
// your in business!
What may not be clear tot the naked eye is the wonderfully simplified way JPA allows you to make use of complicated SQL structures inside the JPA interface. Imagine if you an SQL as follows;
SELECT array_agg(players), player_teams
SELECT DISTINCT t1.t1player AS players, t1.player_teams
p.playerid AS t1id,
concat(p.playerid,':', p.playername, ' ') AS t1player,
array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
FROM player p
LEFT JOIN plays pl ON p.playerid = pl.playerid
GROUP BY p.playerid, p.playername
) t1
p.playerid AS t2id,
array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
FROM player p
LEFT JOIN plays pl ON p.playerid = pl.playerid
GROUP BY p.playerid, p.playername
) t2 ON t1.player_teams=t2.player_teams AND t1.t1id <> t2.t2id
) innerQuery
GROUP BY player_teams
The point is that with createNativeQuery interface, you can still retrieve precisely the data you are looking for and straight into the desired object for easy access by Java.
public List<A> findMostPopularA() {
return em.createNativeQuery("SELECT array_agg(players), player_teams
SELECT DISTINCT t1.t1player AS players, t1.player_teams
p.playerid AS t1id,
concat(p.playerid,':', p.playername, ' ') AS t1player,
array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
FROM player p
LEFT JOIN plays pl ON p.playerid = pl.playerid
GROUP BY p.playerid, p.playername
) t1
p.playerid AS t2id,
array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
FROM player p
LEFT JOIN plays pl ON p.playerid = pl.playerid
GROUP BY p.playerid, p.playername
) t2 ON t1.player_teams=t2.player_teams AND t1.t1id <> t2.t2id
) innerQuery
GROUP BY player_teams
", A.class).getResultList();

Entity Sql for a Many to Many relationship

Consider two tables Bill and Product with a many to many relationship. How do you get all the bills for a particular product using Entity Sql?
Something like this
SELECT B FROM [Container].Products as P
WHERE P.ProductID == 1
will produce a row for each Bill
Another option is something like this:
FROM [Container].Products AS P
WHERE P.ProductID == 1
Which will produce a row for each matching Product (in this case just one)
and the second column in the row will include a nested result set containing the bills for that product.
Hope this helps
You need to use some linq like this;
using (YourEntities ye = new YourEntities())
Product myProduct = ye.Product.First(p => p.ProductId = idParameter);
var bills = myProduct.Bill.Load();
This assumes that you have used the entitiy framework to build a model for you data.
The bills variable will hold a collection of Bill objects that are related to your product object.
Hope it helps.