How to use DISTINCT in VIEWS correctly - postgresql

I have two tables from which I want to make a join with some columns to provide a view for my java/hibernate application. It looks like this:
CREATE VIEW customer_contacts AS cc
SELECT DISTINCT ON (cust.id) cust.id
cust.company
cust.zip
...
con.name
con.forename
...
FROM contacts con
LEFT JOIN customer cust ON con.customer = cust.id
ORDER BY cust.id
So far so good. Very simple.
If I make a SELECT on the view like:
SELECT *
FROM cc
WHERE name ilike '%schult%'
I get 13 results.
If I make the same query directly with the view statement
SELECT DISTINCT ON (cust.id) cust.id
cust.company
cust.zip
...
con.name
con.forename
...
FROM contacts con
LEFT JOIN customer cust ON con.customer = cust.id
WHERE name ilike '%schult%'
ORDER BY cust.id
I got 75 results!
I figured out that it is the DISTINCT that corrupts the result. But why?
And how can I use it correctly?

Your queries (view based and direct) have different order of applying condition:
direct query searches for %shult% and then applies distinct on
view applies distinct on and then searches for %shult%
Are you aware how distinct on works?
It selects first row (it may be undeterministic if proper sort is not defined) for given attributes and leaves other.
For instance:
Let's say we have customer with id=1 and two connected contacts one with name='Schultz' and one with name='Schmidt'.
Now view based select will apply distinct on and select customer with some contact (first one, undeterministic in this case), then name ilike '%schult%' will be applied - it may happen that Schultz will be removed by distinct on.
Recommended reading:
https://www.postgresql.org/docs/9.0/static/sql-select.html#SQL-DISTINCT

Related

Oracle Sql - Discarding outer select if inner select returns null, and avoiding multiple rows

Pre-Info: In our company a person is marked * if he is actively working. And there are people who changed their departments.
For a report I use 2 tables named COMPANY_PERSON_ALL and trifm_izinler4, joining person_id field as below.
I want to discard (don't list) the row, if the first inner select returns null.
And I want to prevent the second inner select returning multiple Departments.
select izn.person_id, izn.adi_soyadi, izn.company_id,
(select a.employee_status from COMPANY_PERSON_ALL a where a.employee_status = '*' and a.person_id = izn.person_id) as Status,
(select a.org_code from COMPANY_PERSON_ALL a where a.person_id = izn.person_id) as Department,
izn.hizmet_suresi, izn.kalan_izin
from trifm_izinler4 izn
where trunc(rapor_tarihi) = trunc(SYSDATE)
Can you help me how to overcome these 2 problems of inner select statements?
Assuming you only want to see the department from the active person record, you can just join the two tables instead of using subquery expressions, and filter on that status:
select izn.person_id, izn.adi_soyadi, izn.company_id,
a.employee_status as status, a.org_code as department
izn.hizmet_suresi, izn.kalan_izin
from trifm_izinler4 izn
join company_person_all a on a.person_id = izn.person_id
where rapor_tarihi >= trunc(SYSDATE)
-- and rapor_tarihi < trunc(SYSDATE) + 1 -- probably not needed
and a.employee_status = '*'
I've also changed the date comparison; if you compare using trunc(rapor_tarihi) then a normal index on that column can't be used, so it's generally better to compare the original value against a range. Since you're comparing against today's date you probably only need to look for values greater than midnight today, but if that column can have future dates then you can put an upper bound on the range of midnight tomorrow - which I've included but commented out.
If a person can be active in more than one department at a time then this will show all of those, but your wording suggests people are only active in one at a time. If you want to see a department for all active users, but not necessarily the one that has the active flag (or if there can be more than one active), then it's a bit more complicated, and you need to explain how you would want to choose which to show.

Update count in row after Insert

I'm completely new to SQL and have a question. I am using is PostgreSQL.
I have two tables called "employees" and "offices"
The table "employees" have a list of unique employees with each having an OfficeID (The office where they work).
What I want to do is to "count" the number of appearances of the Office_ID and take that count into the table "offices" where the "office_ID" have a column called "number_of_employees".
Being completely new to SQL the only thing I have managed to even come close to this is fore example.
SELECT COUNT(*)
FROM employees
WHERE office_id = 203
But this only selects and gives the sum of rows with the id "203" that has to be manually entered into "number of empolyees"
What I want is a trigger function that updates the field "number_of_empolyees" when a new record is inserted into the table "empolyees"
A view is the way to go here.
I am assuming since you're completely new to SQL, you're unsure how to make it work (Edit: just seen your comment after posting :^D) .
The correct way to count employees for each office is:
SELECT office_id, COUNT(*) as employeeCount
FROM employees
GROUP BY office_id
Note how your WHERE office_id = XXX has been replaced by a GROUP BY office_id in order to count employees for all offices in a single query.
That being done, we can use it inside the view.
Be careful about the JOIN: I believe in your schema, an office may have no employee (for instance, right after you created it or right before you delete it). We will handle that part with a LEFT JOIN.
CREATE VIEW OfficeWithEmployeeCount AS
SELECT Offices.*, EmployeeCount
FROM Offices
LEFT OUTER JOIN (SELECT office_id, COUNT(*) as EmployeeCount FROM Employees GROUP BY office_id) T
ON Offices.office_id = T.office_id
Note: to avoid having NULL returned in EmployeeCount for empty offices, you may want to write:
CREATE VIEW OfficeWithEmployeeCount AS
SELECT Offices.*, COALESCE(EmployeeCount,0)
FROM ...

Select multiple column with out code duplication while joining two table #active record # rails 2.3

Let us consider two tables
table1 - name,id,publisher_name,exp_date
table2-book_id,price,discount,last_date
I have to retrieve the name, id,publisher_name from table1 and price, last_date from table2
I wrote a code in active record rails 2
Table1.find(:all,:select=>"table1.name,table1.publisher_name,table1.id,table2.last_date,table2.price",:joins=>"LEFT OUTER JOIN table1s on table1s.id= table2s.book_id")
in this code by selecting multiple column name we need write that table name repeatedly,
need a simple code to avoid this problem
if the selected columns are not present in both tables you don't need to write the tablename as a prefix. You also don't need to name the table2 in front of "book_id". You only need them if the column-names are ambigious.
Table1.find( :all, :select=> "name, publisher_name, id, last_date, price", :joins => "LEFT OUTER JOIN table1s on table1s.id = book_id")

select distinct from 2 columns but only 1 is duplicate

select a.subscriber_msisdn, war.created_datetime from
(
select distinct subscriber_msisdn from wiz_application_response
where application_item_id in
(select id from wiz_application_item where application_id=155)
and created_datetime between '2012-10-07 00:00' and '2012-11-15 00:00:54'
) a
left outer join wiz_application_response war on (war.subscriber_msisdn=a.subscriber_msisdn)
the sub select returns 11 rows but when joined return 18 (with duplicates). The objective of this query is only add the date column to the 11 rows of the sub select.
Based on your description, it stands to reason that there are multiple created_datetime values for some of the subscriber_msisdn values which is what prompted you to use the distinct in the subquery to begin with. By joining the sub query to the original table you are defeating this. A cleaner way to write the query would be:
SELECT
war.subscriber_msisdn
, war.created_datetime
FROM
wiz_application_response war
LEFT JOIN wiz_application_item wai
ON war.application_item_id = wai.id
AND wai.application_id = 155
WHERE
war.created_datetime BETWEEN '2012-10-07 00:00' AND '2012-11-15 00:00:54'
This should return only the rows from the war table that satisfy the criteria based on the wai table. It should not be and outer join unless you wanted to return all the rows from war table that satisfied the created_datetime parameter regardless of the application_item_id parameter.
This is my best guess based on the limited information I have about your tables and what I’m assuming you’re trying to accomplish. If this doesn’t get you what you are after, I will continue to offer other ideas based on additional information you could provide. Hope this works.
Can most probably simplified to this:
SELECT DISTINCT ON (1)
r.subscriber_msisdn, r.created_datetime
FROM wiz_application_item i
JOIN wiz_application_response r ON r.application_item_id = i.id
WHERE i.application_id = 155
AND i.created_datetime BETWEEN '2012-10-07 00:00' AND '2012-11-15 00:00:54'
ORDER BY 1, 2 DESC -- to pick the latest created_datetime
Details depend on missing information.
More explanation here.

T-SQL - How to write query to get records that match ALL records in a many to many join

(I don't think I have titled this question correctly - but I don't know how to describe it)
Here is what I am trying to do:
Let's say I have a Person table that has a PersonID field. And let's say that a Person can belong to many Groups. So there is a Group table with a GroupID field and a GroupMembership table that is a many-to-many join between the two tables and the GroupMembership table has a PersonID field and a GroupID field. So far, it is a simple many to many join.
Given a list of GroupIDs I would like to be able to write a query that returns all of the people that are in ALL of those groups (not any one of those groups). And the query should be able to handle any number of GroupIDs. I would like to avoid dynamic SQL.
Is there some simple way of doing this that I am missing?
Thanks,
Corey
select person_id, count(*) from groupmembership
where group_id in ([your list of group ids])
group by person_id
having count(*) = [size of your list of group ids]
Edited: thank you dotjoe!
Basically you are looking for Persons for whom there is no group he is not a member of, so
select *
from Person p
where not exists (
select 1
from Group g
where not exists (
select 1
from GroupMembership gm
where gm.PersonID = p.ID
and gm.GroupID = g.ID
)
)
You're basically not going to avoid "dynamic" SQL in the sense of dynamically generating the query at query time. There's no way to hand a list around in SQL (well, there is, table variables, but getting them into the system from C# is either impossible (2005 & below) or else annoying (2008)).
One way that you could do it with multiple queries is to insert your list into a work table (probably a process-keyed table) and join against that table. The only other option would be to use a dynamic query such as the ones specified by Jonathan and hongliang.