How to store and query a database with tree structure

How to store and query a database with tree structure - postgresql

I'm a member of a MLM network and I'm also a developer. My question is regarding the database structure to build a MLM software with infinite levels. Example:
Person 1 (6000 people is his network - but only 4 direct linked to him)
How to store that data and query how many points does his network produce?
I could possibly do it use many-to-many relationship, but once we have a lot of users and a huge network, it costs a lot to query and loop through these records.

In any database, if each member of the "tree" has the same properties, it's best to use a self referencing table, especially if each tree has 1 and only 1 direct parent.
IE.
HR
------
ID
first_name
last_name
department_id
sal
boss_hr_id (referneces HR.ID)
Usually the big boss would have a NULL boss_hr_id
To query such a structure, in postgres, you can use CTEs ("with recursive" statement)
For table above, a query like this will work:
with recursive ret(id, first_name, last_name, dept_id,boss_hr_id) as
(
select * from hr
where hr.id=**ID_OF_PERSON_YOU_ARE_QUERYING_STRUCTURE**
union
select hr.id, hr.first_name, hr.last_name,hr.dept_id,hr.boss_hr_id, lev+1 from hr
inner join ret on ret.boss_hr_id=hr.hr_id
)
select * from ret
)

Related

SQL Natural Join

Okay. So the question that I got asked by the teacher was this:
(5 marks) Construct a SQL query on the dvdrental database that uses a natural join of two or more tables and an additional where condition. (E.g. find the titles of films rented by a particular customer.) Note the hints on the course news page if your query returns nothing.
Here is the layout of the database im working with:
http://www.postgresqltutorial.com/wp-content/uploads/2013/05/PostgreSQL-Sample-Database.png
The hint to us was this:
PostgreSQL hint:
If a natural join doesn't produce any results in the dvdrental DB, it is because many tables have the last update: timestamp field, and thus the natural join tries to join on that field as well as the intended field.
e.g.
select *
from film natural join inventory;
does not work because of this - it produces an empty table (no results).
Instead, use
select *
from film, inventory
where film.film_id = inventory.film_id;
This is what I did:
select *
from film, customer
where film.film_id = customer.customer_id;
The problem is I cannot get a particular customer.
I tried doing customer_id = 2; but it returns a error.
Really need help!

Well, it seems that you would like to join two tables that have no direct relation with each other, there's your issue:
where film.film_id = customer.customer_id
To find which films are rented by which customer you would have to join customer table with rental, then with inventory and finally with film.
The task description states
Construct a SQL query on the dvdrental database that uses a natural join of two or more tables and an additional where condition.quote

Postgres subquery has access to column in a higher level table. Is this a bug? or a feature I don't understand?

I don't understand why the following doesn't fail. How does the subquery have access to a column from a different table at the higher level?
drop table if exists temp_a;
create temp table temp_a as
(
select 1 as col_a
);
drop table if exists temp_b;
create temp table temp_b as
(
select 2 as col_b
);
select col_a from temp_a where col_a in (select col_a from temp_b);
/*why doesn't this fail?*/
The following fail, as I would expect them to.
select col_a from temp_b;
/*ERROR: column "col_a" does not exist*/
select * from temp_a cross join (select col_a from temp_b) as sq;
/*ERROR: column "col_a" does not exist
*HINT: There is a column named "col_a" in table "temp_a", but it cannot be referenced from this part of the query.*/
I know about the LATERAL keyword (link, link) but I'm not using LATERAL here. Also, this query succeeds even in pre-9.3 versions of Postgres (when the LATERAL keyword was introduced.)
Here's a sqlfiddle: http://sqlfiddle.com/#!10/09f62/5/0
Thank you for any insights.

Although this feature might be confusing, without it, several types of queries would be more difficult, slower, or impossible to write in sql. This feature is called a "correlated subquery" and the correlation can serve a similar function as a join.
For example: Consider this statement
select first_name, last_name from users u
where exists (select * from orders o where o.user_id=u.user_id)
Now this query will get the names of all the users who have ever placed an order. Now, I know, you can get that info using a join to the orders table, but you'd also have to use a "distinct", which would internally require a sort and would likely perform a tad worse than this query. You could also produce a similar query with a group by.
Here's a better example that's pretty practical, and not just for performance reasons. Suppose you want to delete all users who have no orders and no tickets.
delete from users u where
not exists (select * from orders o where o.user_d = u.user_id)
and not exists (select * from tickets t where t.user_id=u.ticket_id)
One very important thing to note is that you should fully qualify or alias your table names when doing this or you might wind up with a typo that completely messes up the query and silently "just works" while returning bad data.
The following is an example of what NOT to do.
select * from users
where exists (select * from product where last_updated_by=user_id)
This looks just fine until you look at the tables and realize that the table "product" has no "last_updated_by" field and the user table does, which returns the wrong data. Add the alias and the query will fail because no "last_updated_by" column exists in product.
I hope this has given you some examples that show you how to use this feature. I use them all the time in update and delete statements (as well as in selects-- but I find an absolute need for them in updates and deletes often)

Querying a table for rows with no foreign keys pointing to them

Let's say we have two tables in our database.
Table 1:
Employer:
id
name
Table 2:
Employee:
id
name
employer_id
My goal is to find a relatively efficient query to find all employers with no employees. The only way I could think to do this involves a subquery, but I suspect there should be a way to do this using a JOIN.
Here's what I originally had:
SELECT * FROM employer
WHERE employer.id NOT IN (
SELECT employer_id FROM employee
);
As I couldn't find any answers online for this, my solution is in the answers below (admittedly with a good chunk of help from a coworker). I'm using postgres 9.3 and sqlalchemy 0.8.

SELECT * FROM employer
LEFT OUTER JOIN employee ON employee.employer_id = employer.id
WHERE employee.id IS NULL;
For those not familiar with outer joins (I wasn't before this), this query will return all employer/employee pairs AND returns all employers without employees. These will be paired up with "dummy" employee rows, with all values being NULL. As a consequence, filtering out all results with employee ids will leave you with just the employers without employees.
(If anyone can give a better-worded explanation, by all means please do.)
In sqlalchemy:
session.query(Employer).outerjoin(
Employer.id==Employee.employer_id
).filter(Employee.id==None)

T-SQL - How to write query to get records that match ALL records in a many to many join

(I don't think I have titled this question correctly - but I don't know how to describe it)
Here is what I am trying to do:
Let's say I have a Person table that has a PersonID field. And let's say that a Person can belong to many Groups. So there is a Group table with a GroupID field and a GroupMembership table that is a many-to-many join between the two tables and the GroupMembership table has a PersonID field and a GroupID field. So far, it is a simple many to many join.
Given a list of GroupIDs I would like to be able to write a query that returns all of the people that are in ALL of those groups (not any one of those groups). And the query should be able to handle any number of GroupIDs. I would like to avoid dynamic SQL.
Is there some simple way of doing this that I am missing?
Thanks,
Corey

select person_id, count(*) from groupmembership
where group_id in ([your list of group ids])
group by person_id
having count(*) = [size of your list of group ids]
Edited: thank you dotjoe!

Basically you are looking for Persons for whom there is no group he is not a member of, so
select *
from Person p
where not exists (
select 1
from Group g
where not exists (
select 1
from GroupMembership gm
where gm.PersonID = p.ID
and gm.GroupID = g.ID
)
)

You're basically not going to avoid "dynamic" SQL in the sense of dynamically generating the query at query time. There's no way to hand a list around in SQL (well, there is, table variables, but getting them into the system from C# is either impossible (2005 & below) or else annoying (2008)).
One way that you could do it with multiple queries is to insert your list into a work table (probably a process-keyed table) and join against that table. The only other option would be to use a dynamic query such as the ones specified by Jonathan and hongliang.

How can I write a SQL select statement to include a lookup from another table?

I'm copying data from one database to another and massaging the data while I'm at it. Both databases have tables called Clients and Jobs.
However, in database "Alpha" the Jobs table does not have a relationship to the Clients table, where database "Epsilon" does. Alpha's Jobs table just has the Clients name in an nvarchar column.
I need a select statement to lookup the Client's ID in the Client table by their name while I am inserting it into the Jobs table in Epsilon.
My unfinished SQL statement looks like this:
insert into Epsilon.dbo.Jobs (ClientId, Name, Location, DateCreated)
select ????, Name, Location, DateCreated from Alpha.dbo.Jobs
How can I modify this so that the ???? contains the ClientId from the Clients table in Epsilon? I know I need to lookup the data using the Name column in Jobs, but I can't figure out the syntax for this.

What you need is a join. Joins, contrary to what pretty much everybody thinks when starting out, don't require defined relationships in the schema of the database. They just require that the two columns you're comparing have the same type (edit see comments).
The question is which join do you want. Because there isn't a relationship defined, there may be clients that have jobs and clients that don't, and jobs that have clients and jobs that don't.
I'm assuming that you want all JOBS that exist, and where a ClientId matches the CLIENTS table bring in the ClientId, and where that relationship doesn't exist to leave the ClientId null. We can do this with a LEFT JOIN. Jobs LEFT JOIN Clients will bring in all records on the LEFT, even where the relationship defined with Clients on the right doesn't exist. We could reverse the two and do a RIGHT JOIN, but that's not what people usually do. I'll leave it to you to read up on other types of joins and how they work.
So your select statement would look like:
select ClientId, Name, Location, DateCreated
from Alpha.dbo.Jobs as J LEFT JOIN
Alpha.dbo.Clients as C ON j.ClientName = c.ClientName
If Jobs.ClientName is not the same data type as c.ClientName, you can edit the schema before running the query to bring them in line with each other.

insert into Epsilon.dbo.Jobs (ClientId, Name, Location, DateCreated)
select c.ClientID, a.Name, a.Location, a.DateCreated from Alpha.dbo.Jobs a
join Epsilon.dbo.Client c on c.Name = a.ClientName
This is a pretty optimistic join, but even if it needs to be modified this should give you the general idea.

insert into Epsilon.dbo.Jobs
(ClientId, Name, Location,
DateCreated)
select c.ClientId, Name, Location, DateCreated from Alpha.dbo.Jobs as j
inner join Epsilon.dbo.Clients as c On
(j.ClientId = c.ClientId)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse